You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/27 20:48:34 UTC

[GitHub] [spark] xkrogen commented on pull request #34024: [SPARK-36784][SHUFFLE][WIP] Handle DNS issues on executor to prevent shuffle nodes from getting added to exclude list

xkrogen commented on pull request #34024:
URL: https://github.com/apache/spark/pull/34024#issuecomment-928264312


   > if a node is decommissioned/removed - fetch failures could result due to `UnknownHostException` as the remote host is no longer resolvable
   
   Agreed, but the proposed changes will accommodate for this situation. If an `UnknownHostException` is seen, we don't immediately assume that there are DNS issues. We first check if some other "known good" hostname is resolvable, as defined by `spark.network.dnsHealthCheck.host`, and only if _that_ host isn't resolvable, we assume there is a local DNS issue. If no such host is configured, we fall back to the current behavior, which is to assume that `UnknownHostException` indicates an issue with the remote host. See `Executor#isDNSResolvableIfConfigured` for more details.
   
   This isn't perfect, but I think it acts as a very practical heuristic that should cover the majority of cases.
   
   cc @venkata91 as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org