You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@celeborn.apache.org by "AngersZhuuuu (via GitHub)" <gi...@apache.org> on 2023/04/26 06:46:15 UTC

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1461: [CELEBORN-556][BUG] ReserveSlot should not use default RPC time out since register shuffle max timeout is network timeout

AngersZhuuuu opened a new pull request, #1461:
URL: https://github.com/apache/incubator-celeborn/pull/1461

   ### What changes were proposed in this pull request?
   ReserveSlot support customized spec timeout config
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1461: [CELEBORN-556][BUG] ReserveSlot should not use default RPC time out since register shuffle max timeout is network timeout

Posted by "waitinfuture (via GitHub)" <gi...@apache.org>.
waitinfuture commented on code in PR #1461:
URL: https://github.com/apache/incubator-celeborn/pull/1461#discussion_r1189332640


##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -2493,6 +2501,14 @@ object CelebornConf extends Logging {
       .timeConf(TimeUnit.MILLISECONDS)
       .createWithDefaultString("5s")
 
+  val RESERVE_SLOTS_RPC_TIMEOUT: ConfigEntry[Long] =
+    buildConf("celeborn.rpc.reserveSlots.askTimeout")
+      .categories("client")
+      .version("0.3.0")
+      .doc("Timeout for LifecycleManager request reserve slots.")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .createWithDefaultString("30s")

Review Comment:
   30s seems too shot, perhaps 60s?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture merged pull request #1461: [CELEBORN-556][BUG] ReserveSlot should not use default RPC time out since register shuffle max timeout is network timeout

Posted by "waitinfuture (via GitHub)" <gi...@apache.org>.
waitinfuture merged PR #1461:
URL: https://github.com/apache/incubator-celeborn/pull/1461


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1461: [CELEBORN-556][BUG] ReserveSlot should not use default RPC time out since register shuffle max timeout is network timeout

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu commented on PR #1461:
URL: https://github.com/apache/incubator-celeborn/pull/1461#issuecomment-1537093886

   ping @waitinfuture @RexXiong 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1461: [CELEBORN-556][BUG] ReserveSlot should not use default RPC time out since register shuffle max timeout is network timeout

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu commented on code in PR #1461:
URL: https://github.com/apache/incubator-celeborn/pull/1461#discussion_r1189334479


##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -654,6 +654,29 @@ class CelebornConf(loadDefaults: Boolean) extends Cloneable with Logging with Se
   def quotaManagerClass: String = get(QUOTA_MANAGER)
   def quotaConfigurationPath: Option[String] = get(QUOTA_CONFIGURATION_PATH)
 
+  // //////////////////////////////////////////////////////
+  //                Shuffle Client RPC                   //
+  // //////////////////////////////////////////////////////
+  def reserveSlotsRpcTimeout: RpcTimeout =
+    new RpcTimeout(get(RESERVE_SLOTS_RPC_TIMEOUT).milli, RESERVE_SLOTS_RPC_TIMEOUT.key)

Review Comment:
   DOne



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1461: [CELEBORN-556][BUG] ReserveSlot should not use default RPC time out since register shuffle max timeout is network timeout

Posted by "waitinfuture (via GitHub)" <gi...@apache.org>.
waitinfuture commented on code in PR #1461:
URL: https://github.com/apache/incubator-celeborn/pull/1461#discussion_r1189331627


##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -654,6 +654,29 @@ class CelebornConf(loadDefaults: Boolean) extends Cloneable with Logging with Se
   def quotaManagerClass: String = get(QUOTA_MANAGER)
   def quotaConfigurationPath: Option[String] = get(QUOTA_CONFIGURATION_PATH)
 
+  // //////////////////////////////////////////////////////
+  //                Shuffle Client RPC                   //
+  // //////////////////////////////////////////////////////
+  def reserveSlotsRpcTimeout: RpcTimeout =
+    new RpcTimeout(get(RESERVE_SLOTS_RPC_TIMEOUT).milli, RESERVE_SLOTS_RPC_TIMEOUT.key)

Review Comment:
   whitespace



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] codecov[bot] commented on pull request #1461: [CELEBORN-556][BUG] ReserveSlot should not use default RPC time out since register shuffle max timeout is network timeout

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #1461:
URL: https://github.com/apache/incubator-celeborn/pull/1461#issuecomment-1522928290

   ## [Codecov](https://codecov.io/gh/apache/incubator-celeborn/pull/1461?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1461](https://codecov.io/gh/apache/incubator-celeborn/pull/1461?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (86caaf2) into [main](https://codecov.io/gh/apache/incubator-celeborn/commit/537fc94df298b22479159e50579228491a4b7353?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (537fc94) will **decrease** coverage by `0.00%`.
   > The diff coverage is `60.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##             main    #1461      +/-   ##
   ==========================================
   - Coverage   44.82%   44.81%   -0.00%     
   ==========================================
     Files         155      155              
     Lines        9586     9594       +8     
     Branches      955      955              
   ==========================================
   + Hits         4296     4299       +3     
   - Misses       5007     5011       +4     
   - Partials      283      284       +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-celeborn/pull/1461?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...cala/org/apache/celeborn/common/CelebornConf.scala](https://codecov.io/gh/apache/incubator-celeborn/pull/1461?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvY2VsZWJvcm4vY29tbW9uL0NlbGVib3JuQ29uZi5zY2FsYQ==) | `87.02% <60.00%> (+0.05%)` | :arrow_up: |
   
   ... and [2 files with indirect coverage changes](https://codecov.io/gh/apache/incubator-celeborn/pull/1461/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1461: [CELEBORN-556][BUG] ReserveSlot should not use default RPC time out since register shuffle max timeout is network timeout

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu commented on PR #1461:
URL: https://github.com/apache/incubator-celeborn/pull/1461#issuecomment-1539929225

   ping @RexXiong @waitinfuture , Could you take a look, this issue cause register shuffle failed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org