You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/02 08:16:16 UTC

[GitHub] [spark] Ngone51 opened a new pull request, #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

Ngone51 opened a new pull request, #38876:
URL: https://github.com/apache/spark/pull/38876

### What changes were proposed in this pull request?

This PR majorly proposes to reject the block manager re-registration if the executor has been already considered lost/dead from the scheduler backend.

Along with the major proposal, this PR also includes a few other changes:
* Only post `SparkListenerBlockManagerAdded` event when the registration succeeds
* Return an "invalid" executor id when the re-registration fails
* Do not report all blocks when the re-registration fails

### Why are the changes needed?

BlockManager re-registration from dead or terminating executor has led to some known issues, e.g., false-active executor shows up in UP (SPARK-35011), [block fetching to the dead executor](https://github.com/apache/spark/pull/32114#issuecomment-899979045). And since there's no re-registration from the executor itself, it's meaningless to have BlockManager re-registration when the executor is already lost.

Regarding the corner case where the re-registration event comes earlier before the lost executor is actually removed from the scheduler backend, I think it is not possible. Because re-registration will only be required when the BlockManager doesn't see the block manager in `blockManagerInfo`. And the block manager will only be removed from `blockManagerInfo` whether when the executor is already know lost or removed by the driver proactively. So the executor should always be removed from the scheduler backend first before the re-registration event comes.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Unit test

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org