You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "wangshengjie123 (via GitHub)" <gi...@apache.org> on 2023/03/13 06:19:13 UTC

[GitHub] [spark] wangshengjie123 opened a new pull request, #40391: [WIP][SPARK-42766][YARN] YarnAllocator filter excluded nodes when launching containers

wangshengjie123 opened a new pull request, #40391:
URL: https://github.com/apache/spark/pull/40391

### What changes were proposed in this pull request?

In production environment, we hit an issue like this:

If we request 10 containers form nodeA and nodeB, first response from Yarn return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second response from Yarn maybe return some containers from nodeA and launching containers, but when containers(Executor) setup and send register request to Driver, it will be rejected and this failure will be counted to
`spark.yarn.max.executor.failures` and will casue app failed: `Max number of executor failures ($maxNumExecutorFailures) reached`.

### Why are the changes needed?

Filtering excluded nodes when launching containers to avoid failing the app.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added UT

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] closed pull request #40391: [SPARK-42766][YARN] YarnAllocator filter excluded nodes when launching containers

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] closed pull request #40391: [SPARK-42766][YARN] YarnAllocator filter excluded nodes when launching containers
URL: https://github.com/apache/spark/pull/40391


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangshengjie123 commented on pull request #40391: [SPARK-42766][YARN] YarnAllocator filter excluded nodes when launching containers

Posted by "wangshengjie123 (via GitHub)" <gi...@apache.org>.

wangshengjie123 commented on PR #40391:
URL: https://github.com/apache/spark/pull/40391#issuecomment-1467201970

   @Ngone51 @tgravescs could you please help review this pr when you have time, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #40391: [SPARK-42766][YARN] YarnAllocator filter excluded nodes when launching containers

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #40391:
URL: https://github.com/apache/spark/pull/40391#issuecomment-1603469716

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangshengjie123 commented on pull request #40391: [WIP][SPARK-42766][YARN] YarnAllocator filter excluded nodes when launching containers

Posted by "wangshengjie123 (via GitHub)" <gi...@apache.org>.

wangshengjie123 commented on PR #40391:
URL: https://github.com/apache/spark/pull/40391#issuecomment-1465682073

   I am not sure if we should add a Executor exit code and optimize the RegisterExecutor response message in this pr.In production environment, we found sometimes only filter the exclued node when launching containers does not work as well as we want, because we found maybe driver does not request new executors for a period of time, so the execluded nodes list wont be sent to `YarnAllocator` though `requestTotalExecutorsWithPreferredLocalities`, some executors will failed and count to the failure count.
   
   So we add the Executor exit code and optimize the RegisterExecutor response message.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org