You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@celeborn.apache.org by "FMX (via GitHub)" <gi...@apache.org> on 2023/02/15 07:42:46 UTC

[GitHub] [incubator-celeborn] FMX opened a new pull request, #1239: [CELEBORN-302] Fix workers count out of sync in HA mode.

FMX opened a new pull request, #1239:
URL: https://github.com/apache/incubator-celeborn/pull/1239

   ### What changes were proposed in this pull request?
   The worker's list might contain two identical workers if masters failed to connect the worker by occasion.
   
   
   ### Why are the changes needed?
   Change the worker's list to set to remove redundant workers.
   
   
   ### Does this PR introduce _any_ user-facing change?
   NO.
   
   
   ### How was this patch tested?
   UT.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] FMX commented on a diff in pull request #1239: [CELEBORN-302] Fix workers count out of sync in HA mode.

Posted by "FMX (via GitHub)" <gi...@apache.org>.
FMX commented on code in PR #1239:
URL: https://github.com/apache/incubator-celeborn/pull/1239#discussion_r1111665687


##########
master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/AbstractMetaManager.java:
##########
@@ -51,7 +47,7 @@ public abstract class AbstractMetaManager implements IMetadataHandler {
   // Meta data for master service
   public final Set<String> registeredShuffle = ConcurrentHashMap.newKeySet();
   public final Set<String> hostnameSet = ConcurrentHashMap.newKeySet();
-  public final ArrayList<WorkerInfo> workers = new ArrayList<>();
+  public final Set<WorkerInfo> workers = new HashSet<>();

Review Comment:
   Updated. How about now?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] pan3793 commented on a diff in pull request #1239: [CELEBORN-302] Fix workers count out of sync in HA mode.

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #1239:
URL: https://github.com/apache/incubator-celeborn/pull/1239#discussion_r1106768628


##########
master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/AbstractMetaManager.java:
##########
@@ -51,7 +47,7 @@ public abstract class AbstractMetaManager implements IMetadataHandler {
   // Meta data for master service
   public final Set<String> registeredShuffle = ConcurrentHashMap.newKeySet();
   public final Set<String> hostnameSet = ConcurrentHashMap.newKeySet();
-  public final ArrayList<WorkerInfo> workers = new ArrayList<>();
+  public final Set<WorkerInfo> workers = new HashSet<>();

Review Comment:
   The implementation of `WorkerInfo#equals` is fragile, e.g. what if the `obj` is not a `WorkerInfo` object?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture merged pull request #1239: [CELEBORN-302] Fix workers count out of sync in HA mode.

Posted by "waitinfuture (via GitHub)" <gi...@apache.org>.
waitinfuture merged PR #1239:
URL: https://github.com/apache/incubator-celeborn/pull/1239


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1239: [CELEBORN-302] Fix workers count out of sync in HA mode.

Posted by "waitinfuture (via GitHub)" <gi...@apache.org>.
waitinfuture commented on code in PR #1239:
URL: https://github.com/apache/incubator-celeborn/pull/1239#discussion_r1111891917


##########
common/src/main/scala/org/apache/celeborn/common/meta/WorkerInfo.scala:
##########
@@ -259,17 +259,22 @@ class WorkerInfo(
        |""".stripMargin
   }
 
-  override def equals(obj: Any): Boolean = {
-    val other = obj.asInstanceOf[WorkerInfo]
-    host == other.host &&
-    rpcPort == other.rpcPort &&
-    pushPort == other.pushPort &&
-    fetchPort == other.fetchPort &&
-    replicatePort == other.replicatePort
+  private def canEqual(other: Any): Boolean = other.isInstanceOf[WorkerInfo]
+
+  override def equals(other: Any): Boolean = other match {
+    case that: WorkerInfo =>
+      that.canEqual(this) &&

Review Comment:
   useless



##########
common/src/main/scala/org/apache/celeborn/common/meta/WorkerInfo.scala:
##########
@@ -259,17 +259,22 @@ class WorkerInfo(
        |""".stripMargin
   }
 
-  override def equals(obj: Any): Boolean = {
-    val other = obj.asInstanceOf[WorkerInfo]
-    host == other.host &&
-    rpcPort == other.rpcPort &&
-    pushPort == other.pushPort &&
-    fetchPort == other.fetchPort &&
-    replicatePort == other.replicatePort
+  private def canEqual(other: Any): Boolean = other.isInstanceOf[WorkerInfo]

Review Comment:
   useless



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] codecov[bot] commented on pull request #1239: [CELEBORN-302] Fix workers count out of sync in HA mode.

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #1239:
URL: https://github.com/apache/incubator-celeborn/pull/1239#issuecomment-1430895171

   # [Codecov](https://codecov.io/gh/apache/incubator-celeborn/pull/1239?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1239](https://codecov.io/gh/apache/incubator-celeborn/pull/1239?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (539fa60) into [main](https://codecov.io/gh/apache/incubator-celeborn/commit/4fb5b3d5479292eea866ca0123b3360e8568fa23?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4fb5b3d) will **decrease** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##               main    #1239      +/-   ##
   ============================================
   - Coverage     27.24%   27.23%   -0.01%     
   + Complexity      808      807       -1     
   ============================================
     Files           212      212              
     Lines         17997    18000       +3     
     Branches       1964     1966       +2     
   ============================================
   - Hits           4902     4900       -2     
   - Misses        12772    12777       +5     
     Partials        323      323              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-celeborn/pull/1239?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...org/apache/celeborn/common/util/PbSerDeUtils.scala](https://codecov.io/gh/apache/incubator-celeborn/pull/1239?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvY2VsZWJvcm4vY29tbW9uL3V0aWwvUGJTZXJEZVV0aWxzLnNjYWxh) | `0.00% <ø> (ø)` | |
   | [...deploy/master/clustermeta/AbstractMetaManager.java](https://codecov.io/gh/apache/incubator-celeborn/pull/1239?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-bWFzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9jZWxlYm9ybi9zZXJ2aWNlL2RlcGxveS9tYXN0ZXIvY2x1c3Rlcm1ldGEvQWJzdHJhY3RNZXRhTWFuYWdlci5qYXZh) | `88.07% <100.00%> (ø)` | |
   | [...ice/deploy/master/clustermeta/ha/HARaftServer.java](https://codecov.io/gh/apache/incubator-celeborn/pull/1239?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-bWFzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9jZWxlYm9ybi9zZXJ2aWNlL2RlcGxveS9tYXN0ZXIvY2x1c3Rlcm1ldGEvaGEvSEFSYWZ0U2VydmVyLmphdmE=) | `76.58% <0.00%> (-1.35%)` | :arrow_down: |
   | [...apache/celeborn/service/deploy/master/Master.scala](https://codecov.io/gh/apache/incubator-celeborn/pull/1239?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-bWFzdGVyL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvY2VsZWJvcm4vc2VydmljZS9kZXBsb3kvbWFzdGVyL01hc3Rlci5zY2FsYQ==) | `0.00% <0.00%> (ø)` | |
   | [...born/client/commit/MapPartitionCommitHandler.scala](https://codecov.io/gh/apache/incubator-celeborn/pull/1239?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50L3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvY2VsZWJvcm4vY2xpZW50L2NvbW1pdC9NYXBQYXJ0aXRpb25Db21taXRIYW5kbGVyLnNjYWxh) | `0.00% <0.00%> (ø)` | |
   | [...oy/worker/congestcontrol/CongestionController.java](https://codecov.io/gh/apache/incubator-celeborn/pull/1239?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-d29ya2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9jZWxlYm9ybi9zZXJ2aWNlL2RlcGxveS93b3JrZXIvY29uZ2VzdGNvbnRyb2wvQ29uZ2VzdGlvbkNvbnRyb2xsZXIuamF2YQ==) | `77.78% <0.00%> (+0.93%)` | :arrow_up: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org