You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@celeborn.apache.org by "AngersZhuuuu (via GitHub)" <gi...@apache.org> on 2023/04/27 11:24:11 UTC

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1469: [CELEBORN-563] Remove unnecessary code about handle Revive rpc after handle StageEnd

AngersZhuuuu opened a new pull request, #1469:
URL: https://github.com/apache/incubator-celeborn/pull/1469

   ### What changes were proposed in this pull request?
   Worker
   ```
   22/10/25 15:09:21,828 DEBUG [dispatcher-event-loop-14] Worker: Received ReserveSlots request, application_1665646136650_2510409_1-33, master partitions: 478-1; slave partitions: .
   22/10/25 15:09:21,828 INFO [dispatcher-event-loop-14] Worker: Reserved 1 master location and 0 slave location for application_1665646136650_2510409_1-33 master: [PartitionLocation[478-1 10.130.86.41:9092:9094:9093:9095 Mode: Master peer: 10.130.75.25:9092:9094:9093storage hint:MEMORY]]
   22/10/25 15:25:27,728 DEBUG [dispatcher-event-loop-43] Worker: Received CommitFiles request, application_1665646136650_2510409_1-33, master files 64-1,833-0,1345-1,164-1,584-2,1161-1,1129-2,1131-0,237-0,818-1,819-1,1429-0,535-0,733-1,478-1; slave files 832-0,672-1,130-1,134-1,1130-0,1034-1,236-0,268-1,883-1,1428-0,564-1,534-0,508-1,221-1.
   22/10/25 15:25:31,807 DEBUG [dispatcher-event-loop-32] Worker: Received ReserveSlots request, application_1665646136650_2510409_1-33, master partitions: 478-1; slave partitions: .
   22/10/25 15:25:32,130 INFO [dispatcher-event-loop-32] Worker: Reserved 1 master location and 0 slave location for application_1665646136650_2510409_1-33 master: [PartitionLocation[478-1 10.130.86.41:9092:9094:9093:9095 Mode: Master peer: 10.130.75.25:9092:9094:9093storage hint:MEMORY]]
   
   ```
   LifecycleManager
   ```
   22/10/25 15:09:21 WARN LifecycleManager: Batch handle change partition for application_1665646136650_2510409_1 of [33-478-0]
   22/10/25 15:09:21 INFO LifecycleManager: Reserve buffer success for application_1665646136650_2510409_1-33
   22/10/25 15:09:21 WARN LifecycleManager: Renew 33 [(478 0 -> 1)]  success.
   22/10/25 15:25:31 INFO LifecycleManager: Succeed to handle stageEnd for 33.
   22/10/25 15:25:31 WARN LifecycleManager: New partition not found, old partition 33 478-0
   22/10/25 15:25:31 WARN LifecycleManager: Batch handle change partition for application_1665646136650_2510409_1 of [33-478-0]
   22/10/25 15:25:31 WARN LifecycleManager: Renew 33 []  success. 
   ```
   
   The case is (**In old code**):
   
   1. Different task request revive same PartitionLocation (may from rerun task or speculative task)
   2. one request handled and task success
   3. LifecycleManager call handleStageEnd and remove PartitionLocation form shuffleAllocatedWorkers
   4. the delayed revive request was handled, since handleStageEnd remove PartitionLocation
   5. getLatestPartition rerun null and LifecycleManager will revive for the same partition location again
   6. Unfortunately  two location revive to same worker, then worker create same PartitionLocation twice
   
   
   But in #570 we change the method of getLatestPartition, it won't return null and this request will return the revived success PartitionLocation, so we don't need this's PR's removed logic now.
   
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1469: [CELEBORN-563] Remove unnecessary code about handle Revive rpc after handle StageEnd

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu commented on PR #1469:
URL: https://github.com/apache/incubator-celeborn/pull/1469#issuecomment-1534284218

   ping @waitinfuture 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] codecov[bot] commented on pull request #1469: [CELEBORN-563] Remove unnecessary code about handle Revive rpc after handle StageEnd

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #1469:
URL: https://github.com/apache/incubator-celeborn/pull/1469#issuecomment-1525537636

   ## [Codecov](https://codecov.io/gh/apache/incubator-celeborn/pull/1469?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1469](https://codecov.io/gh/apache/incubator-celeborn/pull/1469?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e52dec9) into [main](https://codecov.io/gh/apache/incubator-celeborn/commit/7a4f2ebd8add60b7bc3f1a04ac2165ce72d5e254?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7a4f2eb) will **decrease** coverage by `0.17%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@            Coverage Diff             @@
   ##             main    #1469      +/-   ##
   ==========================================
   - Coverage   44.96%   44.78%   -0.17%     
   ==========================================
     Files         155      155              
     Lines        9587     9587              
     Branches      955      955              
   ==========================================
   - Hits         4310     4293      -17     
   - Misses       4996     5011      +15     
   - Partials      281      283       +2     
   ```
   
   
   [see 3 files with indirect coverage changes](https://codecov.io/gh/apache/incubator-celeborn/pull/1469/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] RexXiong commented on pull request #1469: [CELEBORN-563] Remove unnecessary code about handle Revive rpc after handle StageEnd

Posted by "RexXiong (via GitHub)" <gi...@apache.org>.
RexXiong commented on PR #1469:
URL: https://github.com/apache/incubator-celeborn/pull/1469#issuecomment-1535682460

   LGTM, thanks. Currently getLatestPartition don't depend on shuffleAllocatedWorkers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1469: [CELEBORN-563] Remove unnecessary code about handle Revive rpc after handle StageEnd

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu commented on PR #1469:
URL: https://github.com/apache/incubator-celeborn/pull/1469#issuecomment-1525521809

   cc @waitinfuture @RexXiong 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu merged pull request #1469: [CELEBORN-563] Remove unnecessary code about handle Revive rpc after handle StageEnd

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu merged PR #1469:
URL: https://github.com/apache/incubator-celeborn/pull/1469


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org