You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by yanghua <gi...@git.apache.org> on 2018/05/04 08:58:52 UTC
[GitHub] flink pull request #5954: [FLINK-9276] Improve error message when TaskManage...
GitHub user yanghua opened a pull request:
https://github.com/apache/flink/pull/5954
[FLINK-9276] Improve error message when TaskManager fails
## What is the purpose of the change
*This pull request improves error message when TaskManager fails*
## Brief change log
- *add a Exception param from method `SlotPoolGateway#releaseTaskManager`*
- *refactor the usage of method `SlotPoolGateway#releaseTaskManager`*
## Verifying this change
This change is already covered by existing tests, such as *SchedulerTestBase/SchedulerIsolatedTasks*.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no**)
- The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
- The serializers: (yes / **no** / don't know)
- The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (yes / **no**)
- If yes, how is the feature documented? (not applicable / docs / JavaDocs / **not documented**)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yanghua/flink FLINK-9276
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5954.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5954
----
commit 904f762e6b495ea6b4ffd445b9168f001f74be26
Author: yanghua <ya...@...>
Date: 2018-05-04T08:52:38Z
[FLINK-9276] Improve error message when TaskManager fails
----
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/5954
@aljoscha @kl0u
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/5954
cc @StephanEwen refactored code base on your suggestion, please review again, thanks.
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/5954
This looks fine from my side.
Would like to get a +1 from @tillrohrmann before merging this...
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the issue:
https://github.com/apache/flink/pull/5954
Thanks for opening this PR @yanghua and sorry for my late reply.
I think we accidentally addressed this problem already with another PR (#6202). This was an oversight because you clearly opened the PR before. Since it basically covers the same functionality, I think we can close this PR.
Sorry for the bad PR management. This won't happen again.
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/5954
hi @StephanEwen this PR also reviewed by you, yesterday it has a conflicts with master branch, I have fixed, would you please review it again, thanks.
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/5954
cc @StephanEwen @zentol
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/5954
@tillrohrmann can you merge this?
---
[GitHub] flink pull request #5954: [FLINK-9276] Improve error message when TaskManage...
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua closed the pull request at:
https://github.com/apache/flink/pull/5954
---
[GitHub] flink pull request #5954: [FLINK-9276] Improve error message when TaskManage...
Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/5954#discussion_r186375841
--- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPool.java ---
@@ -1050,11 +1050,12 @@ else if (availableSlots.tryRemove(allocationID)) {
* when we find some TaskManager becomes "dead" or "abnormal", and we decide to not using slots from it anymore.
*
* @param resourceId The id of the TaskManager
+ * @param cause for the release the TaskManager
*/
@Override
- public CompletableFuture<Acknowledge> releaseTaskManager(final ResourceID resourceId) {
+ public CompletableFuture<Acknowledge> releaseTaskManager(final ResourceID resourceId, final Exception cause) {
--- End diff --
I would use `Throwable` in the signatures. It may always be that some Error is the cause (class not found, etc.)
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/5954
@tillrohrmann That's all right. I know you are very busy. Just a little question, I have reviewed PR(#6202). I saw it used `Exception`, and there is a suggestion from Stephan in May 7 :
```
I would use Throwable in the signatures. It may always be that some Error is the cause (class not found, etc.)
```
So I replaced the `Exception ` to `Throwable` in this PR, do you think it can be consider? If not, I would close this PR.
---
[GitHub] flink pull request #5954: [FLINK-9276] Improve error message when TaskManage...
Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/5954#discussion_r186376182
--- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolGateway.java ---
@@ -86,9 +86,10 @@
* Releases a TaskExecutor with the given {@link ResourceID} from the {@link SlotPool}.
*
* @param resourceId identifying the TaskExecutor which shall be released from the SlotPool
+ * @param cause for the release the TaskManager
* @return Future acknowledge which is completed after the TaskExecutor has been released
*/
- CompletableFuture<Acknowledge> releaseTaskManager(final ResourceID resourceId);
+ CompletableFuture<Acknowledge> releaseTaskManager(final ResourceID resourceId, final Exception cause);
--- End diff --
Throwable, see above.
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the issue:
https://github.com/apache/flink/pull/5954
I think this is not super critical and since we only pass in `Exceptions` at the moment I would say let's close this PR.
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/5954
@zentol @tillrohrmann @aljoscha this PR takes a long time and has conflicts (I have rebased recently), hope for it could be merged as soon as possible.
---
[GitHub] flink pull request #5954: [FLINK-9276] Improve error message when TaskManage...
Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/5954#discussion_r186376326
--- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/jobmanager/scheduler/SchedulerTestBase.java ---
@@ -296,7 +296,7 @@ public TaskManagerLocation addTaskManager(int numberSlots) {
@Override
public void releaseTaskManager(ResourceID resourceId) {
try {
- slotPool.releaseTaskManager(resourceId).get();
+ slotPool.releaseTaskManager(resourceId, null).get();
--- End diff --
I would avoid null exceptions.
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/5954
@zentol @tillrohrmann @aljoscha this PR takes a long time and has conflicts (I have rebased recently), hope for it could be merged as soon as possible.
---
[GitHub] flink pull request #5954: [FLINK-9276] Improve error message when TaskManage...
Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/5954#discussion_r186376122
--- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPool.java ---
@@ -1070,13 +1071,22 @@ protected void timeoutPendingSlotRequest(SlotRequestId slotRequestId) {
removePendingRequest(slotRequestId);
}
- private void releaseTaskManagerInternal(final ResourceID resourceId) {
- final FlinkException cause = new FlinkException("Releasing TaskManager " + resourceId + '.');
+ private void releaseTaskManagerInternal(final ResourceID resourceId, final Exception cause) {
--- End diff --
I think we can just use the `cause` exception and do not need to re-wrap it into another exception.
If re-wrapping is needed for some reason, we need to add the original exception as the cause.
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/5954
@zentol can this be merged?
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/5954
OK, closing this PR...
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/5954
cc @StephanEwen
---
[GitHub] flink pull request #5954: [FLINK-9276] Improve error message when TaskManage...
Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/5954#discussion_r186375693
--- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPool.java ---
@@ -212,7 +212,7 @@ public void start(JobMasterId jobMasterId, String newJobManagerAddress) throws E
// release all registered slots by releasing the corresponding TaskExecutors
for (ResourceID taskManagerResourceId : registeredTaskManagers) {
- releaseTaskManagerInternal(taskManagerResourceId);
+ releaseTaskManagerInternal(taskManagerResourceId, null);
--- End diff --
I would try to avoid null here. It is still helpful to have a message that says "SlotPool is shutting down".
---
[GitHub] flink issue #5954: [FLINK-9276] Improve error message when TaskManager fails
Posted by yanghua <gi...@git.apache.org>.
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/5954
cc @StephanEwen
---