You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/07/09 13:37:20 UTC

[GitHub] [spark] Ngone51 commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone

Ngone51 commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
URL: https://github.com/apache/spark/pull/25047#issuecomment-509645441
 
 
   > How do you know the client is running on a node with GPU's or a worker? I guess as long as location is the same it doesn't matter.
   
   Yes, you're right. Now, I used `SPARK_HOME/spark_resources`, but may need to change  as you referred to user's permission on `SPARK_HOME`.
   
   > It seems unreliable to assume you have multiple workers per node (for the case a worker crashes). 
   
   In this PR, We have two ways to prevent resources leak(suppose you're caring about this, right?) when a worker crashes:
   1.  We registers a SignalHandler(by [SignalUtils](https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/core/src/main/scala/org/apache/spark/util/SignalUtils.scala)) to handle TERM signal(whether from `kill -9 worker-pid` or `stop-slave.sh`) for each Worker. The handler could help the worker to release resources in the allocated resources file before it exits.
   
   2. Master can be notified about that worker's crash after it doesn't receive heartbeat from the worker for a configured timeout. Once master knows a worker crashes, it will randomly select another healthy worker on that same host to help that crashed worker to release its allocated resources.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org