You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "warrenzhu25 (via GitHub)" <gi...@apache.org> on 2023/09/23 00:08:43 UTC

[GitHub] [spark] warrenzhu25 opened a new pull request, #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false

warrenzhu25 opened a new pull request, #43067:
URL: https://github.com/apache/spark/pull/43067

   ### What changes were proposed in this pull request?
   Add `keepReadLock` parameter in `lockNewBlockForWriting()`. When `keepReadLock` is `false`, skip `lockForReading()` to avoid block on read Lock or potential deadlock issue.
   
   When 2 tasks try to compute same rdd with replication level of 2 and running on only 2 executors. Deadlock will happen. Details refer [SPARK-45057]
   
   Task thread hold write lock and waiting for replication to remote executor while shuffle server thread which handling block upload request waiting on `lockForReading` in [BlockInfoManager.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala#L457C24-L457C24)
   
   ### Why are the changes needed?
   This could save unnecessary read lock acquire and avoid deadlock issue mention above. 
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added UT in BlockInfoManagerSuite
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on PR #43067:
URL: https://github.com/apache/spark/pull/43067#issuecomment-1736568789

   The test failures are unrelated, but can you retrigger them @warrenzhu25 ?
   Ideally, would prefer a clean build before merging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on PR #43067:
URL: https://github.com/apache/spark/pull/43067#issuecomment-1740135449

   Looks like multiple PR's are impacted by it - and in this case, it is not related.
   Merging to master, 3.5, 3.4, 3.3
   
   Thanks for fixing this @warrenzhu25 !
   Thanks for reviews @JoshRosen, @Ngone51 :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on pull request #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false

Posted by "warrenzhu25 (via GitHub)" <gi...@apache.org>.
warrenzhu25 commented on PR #43067:
URL: https://github.com/apache/spark/pull/43067#issuecomment-1739870131

   > The test failures are unrelated, but can you retrigger them @warrenzhu25 ? Ideally, would prefer a clean build before merging.
   
   I tried to rebuild several times, but it seems still failing and hanging on `LBFGSClusterSuite`, but it should be unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a diff in pull request #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false

Posted by "JoshRosen (via GitHub)" <gi...@apache.org>.
JoshRosen commented on code in PR #43067:
URL: https://github.com/apache/spark/pull/43067#discussion_r1337527770


##########
core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala:
##########
@@ -421,7 +421,8 @@ private[storage] class BlockInfoManager(trackingCacheVisibility: Boolean = false
    */
   def lockNewBlockForWriting(

Review Comment:
   Minor doc nit:
   
   The `@return` docstring a couple of lines should probably be updated because the existing statement about read locks is inaccurate after your change.
   
   I think the new condition could be something like this
   
   > `@return` true if the block did not already exist, false otherwise. If this returns `true`, a write lock on the new block will be held. If this returns `false` then a read lock will be held if `keepReadLock == true`. 



##########
core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala:
##########
@@ -421,7 +421,8 @@ private[storage] class BlockInfoManager(trackingCacheVisibility: Boolean = false
    */
   def lockNewBlockForWriting(

Review Comment:
   Minor doc nit:
   
   The `@return` docstring a couple of lines should probably be updated because the existing statement about read locks is inaccurate after your change.
   
   I think the new condition could be something like this
   
   > `@return` true if the block did not already exist, false otherwise. If this returns `true`, a write lock on the new block will be held. If this returns `false` then a read lock will be held iff `keepReadLock == true`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm closed pull request #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm closed pull request #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false
URL: https://github.com/apache/spark/pull/43067


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on pull request #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false

Posted by "warrenzhu25 (via GitHub)" <gi...@apache.org>.
warrenzhu25 commented on PR #43067:
URL: https://github.com/apache/spark/pull/43067#issuecomment-1732462298

   > Thank you for updating, @warrenzhu25 .
   > 
   > BTW, GitHub Action seems to lose your running CI link. Could you post the your running GitHub Action link here?
   
   Retriggered build.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on pull request #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false

Posted by "warrenzhu25 (via GitHub)" <gi...@apache.org>.
warrenzhu25 commented on PR #43067:
URL: https://github.com/apache/spark/pull/43067#issuecomment-1732457914

   > Could you rebase to `master` branch once more, @warrenzhu25 ?
   
   Rebased.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43067:
URL: https://github.com/apache/spark/pull/43067#issuecomment-1732460692

   Thank you for updating, @warrenzhu25 . 
   
   BTW, GitHub Action seems to lose your running CI link. Could you post the your running GitHub Action link here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on pull request #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false

Posted by "warrenzhu25 (via GitHub)" <gi...@apache.org>.
warrenzhu25 commented on PR #43067:
URL: https://github.com/apache/spark/pull/43067#issuecomment-1734631336

   @JoshRosen Could you help take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org