You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/12/15 16:57:47 UTC

[GitHub] [flink] zentol opened a new pull request #14393: [FLINK-20605][coordination] Rework cancellation of slot allocation futures

zentol opened a new pull request #14393:
URL: https://github.com/apache/flink/pull/14393


   Properly skip the processing of slot allocation that have already concluded by no longer relying on the cancellation of futures, as it is unreliable when using async operations since other operations may occur in-between, and instead using a set for tracking pending allocations and checking at the start of processing.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14393: [FLINK-20605][coordination] Rework cancellation of slot allocation futures

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14393:
URL: https://github.com/apache/flink/pull/14393#issuecomment-745440486


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e449ce26f5e0b80a6e46810bbbffc19d09f5f59b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10901",
       "triggerID" : "e449ce26f5e0b80a6e46810bbbffc19d09f5f59b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e2a9c7f68dc51ad9d1d4a0595bb64f8c2012b863",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10948",
       "triggerID" : "e2a9c7f68dc51ad9d1d4a0595bb64f8c2012b863",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e31004ae37a8ffc1bb8916db18a471c7c037be2b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10970",
       "triggerID" : "e31004ae37a8ffc1bb8916db18a471c7c037be2b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e2a9c7f68dc51ad9d1d4a0595bb64f8c2012b863 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10948) 
   * e31004ae37a8ffc1bb8916db18a471c7c037be2b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10970) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #14393: [FLINK-20605][coordination] Rework cancellation of slot allocation futures

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #14393:
URL: https://github.com/apache/flink/pull/14393#issuecomment-745440486


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e449ce26f5e0b80a6e46810bbbffc19d09f5f59b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e449ce26f5e0b80a6e46810bbbffc19d09f5f59b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e449ce26f5e0b80a6e46810bbbffc19d09f5f59b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14393: [FLINK-20605][coordination] Rework cancellation of slot allocation futures

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14393:
URL: https://github.com/apache/flink/pull/14393#issuecomment-745440486


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e449ce26f5e0b80a6e46810bbbffc19d09f5f59b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10901",
       "triggerID" : "e449ce26f5e0b80a6e46810bbbffc19d09f5f59b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e449ce26f5e0b80a6e46810bbbffc19d09f5f59b Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10901) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol merged pull request #14393: [FLINK-20605][coordination] Rework cancellation of slot allocation futures

Posted by GitBox <gi...@apache.org>.
zentol merged pull request #14393:
URL: https://github.com/apache/flink/pull/14393


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14393: [FLINK-20605][coordination] Rework cancellation of slot allocation futures

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14393:
URL: https://github.com/apache/flink/pull/14393#issuecomment-745440486


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e449ce26f5e0b80a6e46810bbbffc19d09f5f59b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10901",
       "triggerID" : "e449ce26f5e0b80a6e46810bbbffc19d09f5f59b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e449ce26f5e0b80a6e46810bbbffc19d09f5f59b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10901) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14393: [FLINK-20605][coordination] Rework cancellation of slot allocation futures

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14393:
URL: https://github.com/apache/flink/pull/14393#issuecomment-745440486


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e449ce26f5e0b80a6e46810bbbffc19d09f5f59b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10901",
       "triggerID" : "e449ce26f5e0b80a6e46810bbbffc19d09f5f59b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e2a9c7f68dc51ad9d1d4a0595bb64f8c2012b863",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e2a9c7f68dc51ad9d1d4a0595bb64f8c2012b863",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e449ce26f5e0b80a6e46810bbbffc19d09f5f59b Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10901) 
   * e2a9c7f68dc51ad9d1d4a0595bb64f8c2012b863 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on a change in pull request #14393: [FLINK-20605][coordination] Rework cancellation of slot allocation futures

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #14393:
URL: https://github.com/apache/flink/pull/14393#discussion_r544412421



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManager.java
##########
@@ -505,17 +496,12 @@ private void allocateSlot(TaskManagerSlotInformation taskManagerSlot, JobID jobI
 			resourceManagerId,
 			taskManagerRequestTimeout);
 
-		requestFuture.whenComplete(
+		CompletableFuture<Void> slotAllocationResponseProcessingFuture = requestFuture.handleAsync(
 			(Acknowledge acknowledge, Throwable throwable) -> {
-				if (acknowledge != null) {
-					completableFuture.complete(acknowledge);
-				} else {
-					completableFuture.completeExceptionally(throwable);
+				if (!pendingSlotAllocations.contains(slotId)) {
+					LOG.debug("Ignoring slot allocation update from task executor {} for slot {} and job {}, because the allocation was already completed or cancelled.", instanceId, slotId, jobId);
+					return null;
 				}
-			});
-
-		CompletableFuture<Void> slotAllocationResponseProcessingFuture = completableFuture.handleAsync(
-			(Acknowledge acknowledge, Throwable throwable) -> {
 				if (acknowledge != null) {
 					LOG.trace("Completed allocation of slot {} for job {}.", slotId, jobId);
 					slotTracker.notifyAllocationComplete(slotId, jobId);

Review comment:
       Do we still have to distinguish between `CancellationException` and others in the else branch?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #14393: [FLINK-20605][coordination] Rework cancellation of slot allocation futures

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #14393:
URL: https://github.com/apache/flink/pull/14393#discussion_r544677826



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManager.java
##########
@@ -505,17 +496,12 @@ private void allocateSlot(TaskManagerSlotInformation taskManagerSlot, JobID jobI
 			resourceManagerId,
 			taskManagerRequestTimeout);
 
-		requestFuture.whenComplete(
+		CompletableFuture<Void> slotAllocationResponseProcessingFuture = requestFuture.handleAsync(
 			(Acknowledge acknowledge, Throwable throwable) -> {
-				if (acknowledge != null) {
-					completableFuture.complete(acknowledge);
-				} else {
-					completableFuture.completeExceptionally(throwable);
+				if (!pendingSlotAllocations.contains(slotId)) {
+					LOG.debug("Ignoring slot allocation update from task executor {} for slot {} and job {}, because the allocation was already completed or cancelled.", instanceId, slotId, jobId);
+					return null;
 				}
-			});
-
-		CompletableFuture<Void> slotAllocationResponseProcessingFuture = completableFuture.handleAsync(
-			(Acknowledge acknowledge, Throwable throwable) -> {
 				if (acknowledge != null) {
 					LOG.trace("Completed allocation of slot {} for job {}.", slotId, jobId);
 					slotTracker.notifyAllocationComplete(slotId, jobId);

Review comment:
       hmmmm.....no, it should no longer be possible for the future to be canceled.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #14393: [FLINK-20605][coordination] Rework cancellation of slot allocation futures

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #14393:
URL: https://github.com/apache/flink/pull/14393#issuecomment-745426313


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit e449ce26f5e0b80a6e46810bbbffc19d09f5f59b (Tue Dec 15 17:00:18 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14393: [FLINK-20605][coordination] Rework cancellation of slot allocation futures

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14393:
URL: https://github.com/apache/flink/pull/14393#issuecomment-745440486


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e449ce26f5e0b80a6e46810bbbffc19d09f5f59b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10901",
       "triggerID" : "e449ce26f5e0b80a6e46810bbbffc19d09f5f59b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e2a9c7f68dc51ad9d1d4a0595bb64f8c2012b863",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10948",
       "triggerID" : "e2a9c7f68dc51ad9d1d4a0595bb64f8c2012b863",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e31004ae37a8ffc1bb8916db18a471c7c037be2b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e31004ae37a8ffc1bb8916db18a471c7c037be2b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e2a9c7f68dc51ad9d1d4a0595bb64f8c2012b863 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10948) 
   * e31004ae37a8ffc1bb8916db18a471c7c037be2b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org