You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/10/04 22:07:00 UTC

[jira] [Commented] (FLINK-9932) If task executor offer slot to job master timeout the first time, the slot will leak

    [ https://issues.apache.org/jira/browse/FLINK-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638946#comment-16638946 ] 

ASF GitHub Bot commented on FLINK-9932:
---------------------------------------

isunjin commented on a change in pull request #6780: [FLINK-9932] [runtime] fix slot leak when task executor offer slot to job master timeout
URL: https://github.com/apache/flink/pull/6780#discussion_r222842321
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java
 ##########
 @@ -1076,6 +1076,19 @@ private void offerSlotsToJobManager(final JobID jobId) {
 							if (throwable instanceof TimeoutException) {
 								log.info("Slot offering to JobManager did not finish in time. Retrying the slot offering.");
 								// We ran into a timeout. Try again.
+								for (SlotOffer offer : reservedSlots) {
+									try {
+										if (!taskSlotTable.markSlotInactive(offer.getAllocationId(), taskManagerConfiguration.getTimeout())) {
+											// the slot is either free or releasing at the moment
+											final String message = "Could not mark slot " + jobId + " active.";
 
 Review comment:
   merge the log to a single line.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> If task executor offer slot to job master timeout the first time, the slot will leak
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-9932
>                 URL: https://issues.apache.org/jira/browse/FLINK-9932
>             Project: Flink
>          Issue Type: Bug
>          Components: Cluster Management
>    Affects Versions: 1.5.0
>            Reporter: shuai.xu
>            Assignee: shuai.xu
>            Priority: Major
>              Labels: pull-request-available
>
> When task executor offer slot to job master, it will first mark the slot as active.
> If the offer slot call timeout, the task executor will try to call offerSlotsToJobManager again,
> but it will only offer the slot in ALLOCATED state. As the slot has already be mark ACTIVE, it will never be offered and this will cause slot leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)