You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@uniffle.apache.org by GitBox <gi...@apache.org> on 2023/01/11 08:41:27 UTC

[GitHub] [incubator-uniffle] xianjingfeng opened a new pull request, #470: [ISSUE-468] Put unavailable shuffle servers to the end of the server list when se…

xianjingfeng opened a new pull request, #470:
URL: https://github.com/apache/incubator-uniffle/pull/470

   …nding shuffle data
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://github.com/apache/incubator-uniffle/blob/master/CONTRIBUTING.md
     2. Ensure you have added or run the appropriate tests for your PR
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]XXXX Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue.
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Put unavailable shuffle servers to the end of the server list when sending shuffle data if replica=1.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   If we use multiple replicas and the first shuffle server becomes unavailable, sending data will take a lot of time. Because the client will always send to the first shuffle server firstly. #468 
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] kaijchen commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090142749


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {

Review Comment:
   Why is it always negated in line 223, should we name it `excludeBlocklist` instead?
   i.e. change `if (!includeBlockList && ...` to `if (excludeBlocklist && ...`



##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlockList;

Review Comment:
   should be `Blocklist` instead of `BlockList`.



##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {
+    if (replicaNum <= 0) {
+      return;
+    }
     int partitionId = sbi.getPartitionId();
     int shuffleId = sbi.getShuffleId();
+    int assignedNum = 0;
     for (ShuffleServerInfo ssi : serverList) {
+      if (!includeBlockList && replica > 1 && !shuffleServerBlockList.isEmpty()
+          && shuffleServerBlockList.contains(ssi)) {

Review Comment:
   `!shuffleServerBlockList.isEmpty() && shuffleServerBlockList.contains(ssi)` can be simplified to just `shuffleServerBlockList.contains(ssi)` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] kaijchen commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090144868


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {

Review Comment:
   I think `withBlocklist` or `withoutBlocklist` sounds better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090280455


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,28 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
+  void genServerToBlocks(
+      ShuffleBlockInfo sbi,
+      List<ShuffleServerInfo> serverList,
+      int replicaNum,
+      List<ShuffleServerInfo> excludeServers,
       Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+      Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+      boolean withoutBlocklist) {
+    if (replicaNum <= 0) {
+      return;
+    }
     int partitionId = sbi.getPartitionId();
     int shuffleId = sbi.getShuffleId();
+    int assignedNum = 0;
     for (ShuffleServerInfo ssi : serverList) {
+      if (withoutBlocklist && replica > 1
+          && shuffleServerBlocklist.contains(ssi)) {

Review Comment:
   if replicaNum > writeReplicaNum, the blocks may will be sended for two rounds. And `shuffleServerBlocklist` need to be ignored when sending the second round or servers will not be enough.
   
   https://github.com/apache/incubator-uniffle/blob/b80972c226a3afd5568555a9a76fe3bb70ed3d9b/client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java#L249-L254



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090499863


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlocklist;

Review Comment:
   Why do we use the name `shuffleServerBlocklist`? What's the meaning of this variable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090646461


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlocklist;

Review Comment:
   Block is a concept of our system. This name make me confusing.



##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlocklist;

Review Comment:
   Block is a concept of our system. This name make me confused.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090257676


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {
+    if (replicaNum <= 0) {
+      return;
+    }
     int partitionId = sbi.getPartitionId();
     int shuffleId = sbi.getShuffleId();
+    int assignedNum = 0;
     for (ShuffleServerInfo ssi : serverList) {
+      if (!includeBlockList && replica > 1 && !shuffleServerBlockList.isEmpty()

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090292986


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,28 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
+  void genServerToBlocks(
+      ShuffleBlockInfo sbi,
+      List<ShuffleServerInfo> serverList,
+      int replicaNum,
+      List<ShuffleServerInfo> excludeServers,
       Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+      Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+      boolean withoutBlocklist) {
+    if (replicaNum <= 0) {
+      return;
+    }
     int partitionId = sbi.getPartitionId();
     int shuffleId = sbi.getShuffleId();
+    int assignedNum = 0;
     for (ShuffleServerInfo ssi : serverList) {
+      if (withoutBlocklist && replica > 1
+          && shuffleServerBlocklist.contains(ssi)) {

Review Comment:
   My fault. Change to `withBlocklist`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1068876974


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // We can't modify allServers directly, because allServers is shared and is not thread safe
+        List<ShuffleServerInfo> newServerList = new ArrayList<>(allServers);
+        for (int i = newServerList.size() - 2; i >= 0; i--) {
+          ShuffleServerInfo serverInfo = newServerList.get(i);
+          if (shuffleServerBlacklist.contains(serverInfo)) {
+            newServerList.remove(i);
+            newServerList.add(serverInfo);
+          }
+        }
+        sbi.setShuffleServerInfos(newServerList);

Review Comment:
   It is just for UT now. And There may be other requirements need to modify this list in the future, such as #341.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090257857


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlockList;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1091417801


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlocklist;

Review Comment:
   > Block is a concept of our system. This name make me confused.
   
   I think so too, but i have not good idea. How about defectiveServerList? We need to unify our opinions. :joy:   @jerqi  @kaijchen  @advancedxy 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #470: [ISSUE-468] Put unavailable shuffle servers to the end of the server list when se…

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1066943573


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // allServers is shared and is not thread safe
+        synchronized (this) {
+          for (int i = allServers.size() - 2; i >= 0; i--) {

Review Comment:
   Why do we use `2` here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090501134


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -216,6 +246,18 @@ private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> ser
         partitionToBlocks.put(partitionId, Lists.newArrayList());
       }
       partitionToBlocks.get(partitionId).add(sbi);
+      if (excludeServers != null) {
+        excludeServers.add(ssi);
+      }
+      assignedNum++;
+      if (assignedNum >= replicaNum) {
+        break;
+      }
+    }
+
+    if (assignedNum < replicaNum && withBlocklist) {
+      genServerToBlocks(sbi, serverList, replicaNum - assignedNum,

Review Comment:
   Do this cause that one shuffle server will allocate twice?



##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -216,6 +246,18 @@ private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> ser
         partitionToBlocks.put(partitionId, Lists.newArrayList());
       }
       partitionToBlocks.get(partitionId).add(sbi);
+      if (excludeServers != null) {
+        excludeServers.add(ssi);
+      }
+      assignedNum++;
+      if (assignedNum >= replicaNum) {
+        break;
+      }
+    }
+
+    if (assignedNum < replicaNum && withBlocklist) {
+      genServerToBlocks(sbi, serverList, replicaNum - assignedNum,

Review Comment:
   Do this cause that one shuffle server will be allocated twice?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] advancedxy commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1089915876


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -216,6 +245,12 @@ private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> ser
         partitionToBlocks.put(partitionId, Lists.newArrayList());
       }
       partitionToBlocks.get(partitionId).add(sbi);
+      if (excludeServers != null) {

Review Comment:
   > https://github.com/apache/incubator-uniffle/blob/8847ece10a21800e5deb608441fcc6129a6b841d/coordinator/src/main/java/org/apache/uniffle/coordinator/strategy/partition/ContinuousSelectPartitionStrategy.java#L30-L52
   
   See this impl and another strategy:  `RoundSelectPartitionStrategy`, I think these two impls might include duplicated nodes in the serverlist for some cases, such as:
   1. 3 replica with only 2 nodes?
   2. or other corner cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090126995


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {

Review Comment:
   shuffleServerBlockList -> shuffleServerDefectiveList? includeBlockList ->  includeDefectiveServer? WDYT @advancedxy @jerqi 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi merged pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi merged PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] kaijchen commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1091420887


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlocklist;

Review Comment:
   > I think so too, but i have not good idea. How about defectiveServerList? We need to unify our opinions. 😂 @jerqi @kaijchen @advancedxy
   
   Maybe just `defectiveServers`? Because it's actually a `Set`.
   And `withBlocklist` may be changed to `excludeDefectiveServers`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1089896166


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -216,6 +245,12 @@ private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> ser
         partitionToBlocks.put(partitionId, Lists.newArrayList());
       }
       partitionToBlocks.get(partitionId).add(sbi);
+      if (excludeServers != null) {

Review Comment:
   > It would be possible that some servers in the serverList are duplicated.
   
   Really? This is unreasonable and it may be a bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable shuffle servers to the end of the server list when se…

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1067667767


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // allServers is shared and is not thread safe
+        synchronized (this) {
+          for (int i = allServers.size() - 2; i >= 0; i--) {
+            ShuffleServerInfo serverInfo = allServers.get(i);
+            if (shuffleServerBlacklist.contains(serverInfo)) {

Review Comment:
   If all the servers are added the blacklist, the sending order will be third -> second -> first, it all servers are unavailable, the task will be failed. And if some servers become available, it will be removed from the blacklist.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #470: [ISSUE-468] Put unavailable shuffle servers to the end of the server list when se…

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1068850390


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // allServers is shared and is not thread safe
+        synchronized (this) {

Review Comment:
   Could we copy the allServers , modify the order of copy and then use the copy?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1068866788


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // allServers is shared and is not thread safe
+        synchronized (this) {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#issuecomment-1381244902

   > > …nding shuffle data
   > 
   > Could you modify your title and description?
   
   Done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable shuffle servers to the end of the server list when se…

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1067756896


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // allServers is shared and is not thread safe
+        synchronized (this) {

Review Comment:
   I think lock is necessary here, or we need to create a new server list. And this part will only be executed when `shuffleServerBlacklist` is empty, so i think there is little impact here. How about  `synchronized (allServers) `?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] advancedxy commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090151113


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {

Review Comment:
   > I think `withBlocklist` or `withoutBlocklist` sounds better.
   
   this sounds better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] advancedxy commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090150106


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -216,6 +245,12 @@ private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> ser
         partitionToBlocks.put(partitionId, Lists.newArrayList());
       }
       partitionToBlocks.get(partitionId).add(sbi);
+      if (excludeServers != null) {

Review Comment:
   thanks. Didn't see that ealier.



##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -216,6 +245,12 @@ private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> ser
         partitionToBlocks.put(partitionId, Lists.newArrayList());
       }
       partitionToBlocks.get(partitionId).add(sbi);
+      if (excludeServers != null) {

Review Comment:
   thanks. Didn't see that earlier



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #470: [ISSUE-468] Put unavailable shuffle servers to the end of the server list when se…

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#issuecomment-1381227493

   > …nding shuffle data
   > 
   
   Could you modify your title and description?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1068881027


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // We can't modify allServers directly, because allServers is shared and is not thread safe
+        List<ShuffleServerInfo> newServerList = new ArrayList<>(allServers);
+        for (int i = newServerList.size() - 2; i >= 0; i--) {
+          ShuffleServerInfo serverInfo = newServerList.get(i);
+          if (shuffleServerBlacklist.contains(serverInfo)) {
+            newServerList.remove(i);
+            newServerList.add(serverInfo);
+          }
+        }
+        sbi.setShuffleServerInfos(newServerList);

Review Comment:
   This method isn't thread safe, too.



##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // We can't modify allServers directly, because allServers is shared and is not thread safe
+        List<ShuffleServerInfo> newServerList = new ArrayList<>(allServers);
+        for (int i = newServerList.size() - 2; i >= 0; i--) {
+          ShuffleServerInfo serverInfo = newServerList.get(i);
+          if (shuffleServerBlacklist.contains(serverInfo)) {
+            newServerList.remove(i);
+            newServerList.add(serverInfo);
+          }
+        }
+        sbi.setShuffleServerInfos(newServerList);

Review Comment:
   This method `setShuffleServerInfos` isn't thread safe, too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1089927987


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,

Review Comment:
   Could we keep the `indent code style` with other places?



##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {

Review Comment:
   Could you give `includeBlockList` a better name?



##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {
+    if (replicaNum <= 0) {
+      return;
+    }
     int partitionId = sbi.getPartitionId();
     int shuffleId = sbi.getShuffleId();
+    int assignedNum = 0;
     for (ShuffleServerInfo ssi : serverList) {
+      if (!includeBlockList && replica > 1 && !shuffleServerBlockList.isEmpty()

Review Comment:
   Will it allocate the servers which is less than `replicaNum` if exclude nodes are too many?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] advancedxy commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by GitBox <gi...@apache.org>.
advancedxy commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1070780483


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,12 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // We can't modify allServers directly, because allServers is shared and is not thread safe
+        List<ShuffleServerInfo> newServerList = new ArrayList<>(allServers);

Review Comment:
   This is a critical path and we are paying `copying the shuffle server then reordering the shuffle server` for every shuffle send request. Is it possible leverage the `shuffleServerBlackList` in the `genServerToBlocks` method?
   
   P.S.: to be more inclusive, we may replace `shuffleServerBlackList` to `shuffleServerBlockList`.



##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -311,6 +330,17 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     return new SendShuffleDataResult(successBlockIds, failedBlockIds);
   }
 
+  @VisibleForTesting
+  void moveBlacklistServersToTheEnd(List<ShuffleServerInfo> serverInfos) {

Review Comment:
   I noticed that some unit test refers this method. If possible, I would like to make this method a static one: `static moveBlacklistServersToTheEnd(List<ShuffleServerInfo> serverInfos, Set<ShuffleServerInfo> shuffleServerBlockList)`, and unit test this method. 
   
   However, if you refactor the `genServerToBlocks`, this method might be removed. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] kaijchen commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090653870


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlocklist;

Review Comment:
   Alternatives: `disallowlist`, `denylist`, `excludelist`.
   Or maybe just `blacklist`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] advancedxy commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1091430540


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlocklist;

Review Comment:
   +1 for defectiveServers



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable shuffle servers to the end of the server list when se…

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1067665760


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // allServers is shared and is not thread safe
+        synchronized (this) {
+          for (int i = allServers.size() - 2; i >= 0; i--) {

Review Comment:
   No matter if the last shuffle server is in blacklist, we don't need process it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #470: [ISSUE-468] Put unavailable shuffle servers to the end of the server list when se…

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1066944249


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // allServers is shared and is not thread safe
+        synchronized (this) {
+          for (int i = allServers.size() - 2; i >= 0; i--) {
+            ShuffleServerInfo serverInfo = allServers.get(i);
+            if (shuffleServerBlacklist.contains(serverInfo)) {

Review Comment:
   What will it happen if all the servers are added the blacklist?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] advancedxy commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1089889920


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -216,6 +245,12 @@ private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> ser
         partitionToBlocks.put(partitionId, Lists.newArrayList());
       }
       partitionToBlocks.get(partitionId).add(sbi);
+      if (excludeServers != null) {
+        excludeServers.add(ssi);
+      }
+      if (++assignedNum >= replicaNum) {

Review Comment:
   Nit: break this line into two lines for better readability? 
   ```
   assignedNum += 1;
   if (assignedNum >= replicaNum) {
   ...
   ```



##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -216,6 +245,12 @@ private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> ser
         partitionToBlocks.put(partitionId, Lists.newArrayList());
       }
       partitionToBlocks.get(partitionId).add(sbi);
+      if (excludeServers != null) {

Review Comment:
   Did some related code search. It would be possible that some servers in the serverList are duplicated.
   For current logic, the duplicated server would be selected once, is that expected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] kaijchen commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090522375


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlocklist;

Review Comment:
   Blocklist is equal to Blacklist



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090258336


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090116463


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {

Review Comment:
   includeBlockServer or !skipBlockServer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] kaijchen commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090653870


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlocklist;

Review Comment:
   `disallowlist`, `denylist`, `excludelist`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] codecov-commenter commented on pull request #470: [ISSUE-468] Put unavailable shuffle servers to the end of the server list when se…

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#issuecomment-1378425254

   # [Codecov](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#470](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ad45108) into [master](https://codecov.io/gh/apache/incubator-uniffle/commit/ebaff6aa4eccfe4f54a967bc271da7e345f6c34a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ebaff6a) will **decrease** coverage by `0.46%`.
   > The diff coverage is `82.35%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master     #470      +/-   ##
   ============================================
   - Coverage     58.74%   58.28%   -0.47%     
   + Complexity     1664     1497     -167     
   ============================================
     Files           199      184      -15     
     Lines         11236     9651    -1585     
     Branches        999      871     -128     
   ============================================
   - Hits           6601     5625     -976     
   + Misses         4243     3659     -584     
   + Partials        392      367      -25     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...he/uniffle/client/impl/ShuffleWriteClientImpl.java](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL2NsaWVudC9pbXBsL1NodWZmbGVXcml0ZUNsaWVudEltcGwuamF2YQ==) | `34.48% <82.35%> (+4.66%)` | :arrow_up: |
   | [...he/uniffle/server/storage/MultiStorageManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL3NlcnZlci9zdG9yYWdlL011bHRpU3RvcmFnZU1hbmFnZXIuamF2YQ==) | `60.71% <0.00%> (-3.44%)` | :arrow_down: |
   | [...he/uniffle/server/storage/LocalStorageManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL3NlcnZlci9zdG9yYWdlL0xvY2FsU3RvcmFnZU1hbmFnZXIuamF2YQ==) | `85.87% <0.00%> (-2.49%)` | :arrow_down: |
   | [...che/uniffle/server/storage/HdfsStorageManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL3NlcnZlci9zdG9yYWdlL0hkZnNTdG9yYWdlTWFuYWdlci5qYXZh) | `93.75% <0.00%> (-1.49%)` | :arrow_down: |
   | [...he/uniffle/coordinator/CoordinatorGrpcService.java](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29vcmRpbmF0b3Ivc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3VuaWZmbGUvY29vcmRpbmF0b3IvQ29vcmRpbmF0b3JHcnBjU2VydmljZS5qYXZh) | `2.08% <0.00%> (-0.02%)` | :arrow_down: |
   | [.../hadoop/mapreduce/task/reduce/RssEventFetcher.java](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3AvbWFwcmVkdWNlL3Rhc2svcmVkdWNlL1Jzc0V2ZW50RmV0Y2hlci5qYXZh) | | |
   | [.../hadoop/mapreduce/task/reduce/RssBypassWriter.java](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3AvbWFwcmVkdWNlL3Rhc2svcmVkdWNlL1Jzc0J5cGFzc1dyaXRlci5qYXZh) | | |
   | [...k/shuffle/writer/WrappedByteArrayOutputStream.java](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvV3JhcHBlZEJ5dGVBcnJheU91dHB1dFN0cmVhbS5qYXZh) | | |
   | [...apache/hadoop/mapreduce/v2/app/RssMRAppMaster.java](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3AvbWFwcmVkdWNlL3YyL2FwcC9Sc3NNUkFwcE1hc3Rlci5qYXZh) | | |
   | [...org/apache/spark/shuffle/RssSparkShuffleUtils.java](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9Sc3NTcGFya1NodWZmbGVVdGlscy5qYXZh) | | |
   | ... and [30 more](https://codecov.io/gh/apache/incubator-uniffle/pull/470?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1068880752


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // We can't modify allServers directly, because allServers is shared and is not thread safe
+        List<ShuffleServerInfo> newServerList = new ArrayList<>(allServers);
+        for (int i = newServerList.size() - 2; i >= 0; i--) {
+          ShuffleServerInfo serverInfo = newServerList.get(i);
+          if (shuffleServerBlacklist.contains(serverInfo)) {
+            newServerList.remove(i);
+            newServerList.add(serverInfo);
+          }
+        }
+        sbi.setShuffleServerInfos(newServerList);

Review Comment:
   For #341 , we can also decide which server to use here instead of calling method `setShuffleServerInfos`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] kaijchen commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090145619


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {

Review Comment:
   Speaking of the naming, I think `with/without` sounds better than `include/exclude`.
   For example, `withBlocklist`, `withoutBlocklist`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1091406937


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -216,6 +246,18 @@ private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> ser
         partitionToBlocks.put(partitionId, Lists.newArrayList());
       }
       partitionToBlocks.get(partitionId).add(sbi);
+      if (excludeServers != null) {
+        excludeServers.add(ssi);
+      }
+      assignedNum++;
+      if (assignedNum >= replicaNum) {
+        break;
+      }
+    }
+
+    if (assignedNum < replicaNum && withBlocklist) {
+      genServerToBlocks(sbi, serverList, replicaNum - assignedNum,

Review Comment:
   No. Assigned server will be added to `excludeServers`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1091569361


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlocklist;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1089672143


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,12 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // We can't modify allServers directly, because allServers is shared and is not thread safe
+        List<ShuffleServerInfo> newServerList = new ArrayList<>(allServers);

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1068870197


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // We can't modify allServers directly, because allServers is shared and is not thread safe
+        List<ShuffleServerInfo> newServerList = new ArrayList<>(allServers);
+        for (int i = newServerList.size() - 2; i >= 0; i--) {
+          ShuffleServerInfo serverInfo = newServerList.get(i);
+          if (shuffleServerBlacklist.contains(serverInfo)) {
+            newServerList.remove(i);
+            newServerList.add(serverInfo);
+          }
+        }
+        sbi.setShuffleServerInfos(newServerList);

Review Comment:
   It seems that we don't need to call `setShuffleServerInfos`, we just need to call the `genServerToBlocks` with newServerList.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090258005


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {

Review Comment:
   Done



##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {
+    if (replicaNum <= 0) {
+      return;
+    }
     int partitionId = sbi.getPartitionId();
     int shuffleId = sbi.getShuffleId();
+    int assignedNum = 0;
     for (ShuffleServerInfo ssi : serverList) {
+      if (!includeBlockList && replica > 1 && !shuffleServerBlockList.isEmpty()
+          && shuffleServerBlockList.contains(ssi)) {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090257449


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,27 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
-      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+  void genServerToBlocks(ShuffleBlockInfo sbi,
+                         List<ShuffleServerInfo> serverList,
+                         int replicaNum,
+                         List<ShuffleServerInfo> excludeServers,
+                         Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
+                         Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+                         boolean includeBlockList) {

Review Comment:
   Done. change to `withoutBlocklist`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] kaijchen commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1090268671


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -192,12 +206,28 @@ private boolean sendShuffleDataAsync(
     return result;
   }
 
-  private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> serverList,
+  void genServerToBlocks(
+      ShuffleBlockInfo sbi,
+      List<ShuffleServerInfo> serverList,
+      int replicaNum,
+      List<ShuffleServerInfo> excludeServers,
       Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
-      Map<ShuffleServerInfo, List<Long>> serverToBlockIds) {
+      Map<ShuffleServerInfo, List<Long>> serverToBlockIds,
+      boolean withoutBlocklist) {
+    if (replicaNum <= 0) {
+      return;
+    }
     int partitionId = sbi.getPartitionId();
     int shuffleId = sbi.getShuffleId();
+    int assignedNum = 0;
     for (ShuffleServerInfo ssi : serverList) {
+      if (withoutBlocklist && replica > 1
+          && shuffleServerBlocklist.contains(ssi)) {

Review Comment:
   This condition looks strange to me: `withoutBlocklist && shuffleServerBlocklist.contains(ssi)`.
   Could you explain what `withoutBlocklist` is controlling here?
   Maybe the name of this parameter still needs to be changed.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1089918055


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -216,6 +245,12 @@ private void genServerToBlocks(ShuffleBlockInfo sbi, List<ShuffleServerInfo> ser
         partitionToBlocks.put(partitionId, Lists.newArrayList());
       }
       partitionToBlocks.get(partitionId).add(sbi);
+      if (excludeServers != null) {

Review Comment:
   The number of nodes is limited.
   
   https://github.com/apache/incubator-uniffle/blob/cd2662f64f0cd4d0e740ffe97f51adbd8ee6f2bd/coordinator/src/main/java/org/apache/uniffle/coordinator/strategy/assignment/PartitionBalanceAssignmentStrategy.java#L111-L113
   
   https://github.com/apache/incubator-uniffle/blob/cd2662f64f0cd4d0e740ffe97f51adbd8ee6f2bd/coordinator/src/main/java/org/apache/uniffle/coordinator/strategy/assignment/BasicAssignmentStrategy.java#L53-L55



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #470: [ISSUE-468] Put unavailable shuffle servers to the end of the server list when se…

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1067677077


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -246,6 +259,18 @@ public SendShuffleDataResult sendShuffleData(String appId, List<ShuffleBlockInfo
     // which is minimum number when there is at most *replicaWrite - replica* sending server failures.
     for (ShuffleBlockInfo sbi : shuffleBlockInfoList) {
       List<ShuffleServerInfo> allServers = sbi.getShuffleServerInfos();
+      if (replica > 1 && !shuffleServerBlacklist.isEmpty()) {
+        // allServers is shared and is not thread safe
+        synchronized (this) {

Review Comment:
   This lock will influence our performance. This is critical path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1091426951


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlocklist;

Review Comment:
   > > I think so too, but i have not good idea. How about defectiveServerList? We need to unify our opinions. 😂 @jerqi @kaijchen @advancedxy
   > 
   > Maybe just `defectiveServers`? Because it's actually a `Set`. And `withBlocklist` may be changed to `excludeDefectiveServers`.
   
   It's ok for me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] kaijchen commented on a diff in pull request #470: [ISSUE-468] Put unavailable servers to the end of the list when sending shuffle data

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #470:
URL: https://github.com/apache/incubator-uniffle/pull/470#discussion_r1091420887


##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -105,6 +106,7 @@ public class ShuffleWriteClientImpl implements ShuffleWriteClient {
   private final ExecutorService dataTransferPool;
   private final int unregisterThreadPoolSize;
   private final int unregisterRequestTimeSec;
+  private Set<ShuffleServerInfo> shuffleServerBlocklist;

Review Comment:
   > I think so too, but i have not good idea. How about defectiveServerList? We need to unify our opinions. 😂 @jerqi @kaijchen @advancedxy
   
   Maybe just `defectiveServers`? Because it's actually a `Set`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org