You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/05/28 07:49:47 UTC

[GitHub] [flink] zhuzhurk opened a new pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

zhuzhurk opened a new pull request #12375:
URL: https://github.com/apache/flink/pull/12375


   ## What is the purpose of the change
   
   This PR introduces a BulkSlotProvider which supports bulk slot allocation. In this way we are able to check whether the resource requirements of a slot request bulk can be fulfilled at the same time.
   
   ## Brief change log
   
     - *Enabled to set and get whether a physical slot will be occupied indefinitely*
     - *Introduced BulkSlotProvider and its default implementation*
   
   ## Verifying this change
   
     - *Added AllocatedSlotOccupationTest*
     - *Added BulkSlotAllocationTest*
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (**yes** / no / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
     - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r434967448



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolImpl.java
##########
@@ -447,12 +452,40 @@ public void releaseSlot(@Nonnull SlotRequestId slotRequestId, @Nullable Throwabl
 
 		componentMainThreadExecutor.assertRunningInMainThread();
 
+		checkState(batchSlotTimeoutCheckEnabled, "batch slot timeout check is disabled unexpectedly.");
+
 		final PendingRequest pendingRequest = PendingRequest.createBatchRequest(slotRequestId, resourceProfile);
 
 		return requestNewAllocatedSlotInternal(pendingRequest)
 			.thenApply(Function.identity());
 	}
 
+	@Override
+	public CompletableFuture<PhysicalSlot> requestNewAllocatedSlotWithoutTimeout(
+			final SlotRequestId slotRequestId,
+			final ResourceProfile resourceProfile,
+			final boolean isBatchRequest) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		// a slot pool can serve either this kind of request or requestNewAllocatedBatchSlot(...), never both
+		disableBatchSlotTimeoutCheck();

Review comment:
       we can remove this method.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] azagrebin commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435232158



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotProvider.java
##########
@@ -91,6 +92,20 @@
 			allocationTimeout);
 	}
 
+	/**
+	 * Allocates a bulk of physical slots. The allocation will be completed
+	 * normally only when all the requests are fulfilled.
+	 *
+	 * @param physicalSlotRequests requests for physical slots
+	 * @param timeout indicating how long it is accepted that the slot requests can be unfulfillable
+	 * @return future of the results of slot requests
+	 */
+	default CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(

Review comment:
       I think having default method is fine. It is more related to making `OneSlotPerExecutionSlotAllocator` depending only on `BulkSlotProvider `.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * a8902557952ea70746d27e88a392c74724784605 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * ff97aeedb303fe540ebb3728ed280de414606978 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818) 
   * a8902557952ea70746d27e88a392c74724784605 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3371da5790134088bffa988c5a837c34d5d9a443 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154",
       "triggerID" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219",
       "triggerID" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3230",
       "triggerID" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * d3f81cb67e5924bfc9d3e7bbba25af281e6b3102 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3230) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 40df0ed6d44b0e9cd11faa7707490045b48151b0 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514) 
   * 3371da5790134088bffa988c5a837c34d5d9a443 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438016144



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;

Review comment:
       Done via d3f81cb67e5924bfc9d3e7bbba25af281e6b3102.
   `PhysicalSlotRequestBulkTracker#createPhysicalSlotRequestBulk` is introduced so that we can reset the unfulfillable timestamp for a newly created bulk with the clock of `PhysicalSlotRequestBulkTracker`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3371da5790134088bffa988c5a837c34d5d9a443 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621) 
   * 35ef82ee0bf1d008719a52e8182f9b63f3eddb11 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] azagrebin commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435366834



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this(slotSelectionStrategy, slotPool, SystemClock.getInstance());
+	}
+
+	@VisibleForTesting
+	BulkSlotProviderImpl(
+			final SlotSelectionStrategy slotSelectionStrategy,
+			final SlotPool slotPool,
+			final Clock clock) {
+
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+		this.clock = checkNotNull(clock);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request, timeout).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest,
+			final Time timeout) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else {
+			slotFuture = slotPool.requestNewAllocatedSlotWithoutTimeout(

Review comment:
       The problem may be in `UnfulfillableSlotRequestException` when there is no point in retrying, see also [comment](https://github.com/apache/flink/pull/12278/commits/7065a71c2911aa3938c827bdb9029dee9268950c#r435362084). It looks this might need more thinking and simplification of design if possible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] azagrebin commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438643603



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(
+			this::getAllSlotInfos,
+			SystemClock.getInstance());
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public void cancelSlotRequest(SlotRequestId slotRequestId, Throwable cause) {
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		slotPool.releaseSlot(slotRequestId, cause);
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else if (physicalSlotRequest.willSlotBeOccupiedIndefinitely()) {
+			slotFuture = slotPool.requestNewAllocatedSlot(
+				slotRequestId,
+				resourceProfile,
+				null);
+		} else {
+			slotFuture = slotPool.requestNewAllocatedBatchSlot(
+				slotRequestId,
+				resourceProfile);
+		}
+
+		return slotFuture.thenApply(physicalSlot -> new PhysicalSlotRequest.Result(slotRequestId, physicalSlot));
+	}
+
+	private Optional<PhysicalSlot> tryAllocateFromAvailable(
+			final SlotRequestId slotRequestId,
+			final SlotProfile slotProfile) {
+
+		final Collection<SlotSelectionStrategy.SlotInfoAndResources> slotInfoList =
+			slotPool.getAvailableSlotsInformation()
+				.stream()
+				.map(SlotSelectionStrategy.SlotInfoAndResources::fromSingleSlot)
+				.collect(Collectors.toList());
+
+		final Optional<SlotSelectionStrategy.SlotInfoAndLocality> selectedAvailableSlot =
+			slotSelectionStrategy.selectBestSlotForProfile(slotInfoList, slotProfile);
+
+		return selectedAvailableSlot.flatMap(
+			slotInfoAndLocality -> slotPool.allocateAvailableSlot(
+				slotRequestId,
+				slotInfoAndLocality.getSlotInfo().getAllocationId())
+		);
+	}
+
+	private void schedulePendingRequestBulkTimeoutCheck(
+			final PhysicalSlotRequestBulk slotRequestBulk,
+			final Time timeout) {
+
+		componentMainThreadExecutor.schedule(() -> {
+			final PhysicalSlotRequestBulkTracker.TimeoutCheckResult result =
+				slotRequestBulkTracker.checkPhysicalSlotRequestBulkTimeout(slotRequestBulk, timeout);
+
+			switch (result) {
+				case PENDING:
+					//re-schedule the timeout check
+					schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+					break;
+				case TIMEOUT:
+					timeoutSlotRequestBulk(slotRequestBulk);
+					break;
+				default: // no action to take
+			}
+		}, timeout.getSize(), timeout.getUnit());
+	}
+
+	private void timeoutSlotRequestBulk(final PhysicalSlotRequestBulk slotRequestBulk) {
+		final Exception cause = new TimeoutException("Slot request bulk is not fulfillable!");
+		// pending requests must be canceled first otherwise they might be fulfilled by
+		// allocated slots released from this bulk
+		for (SlotRequestId slotRequestId : slotRequestBulk.getPendingRequests().keySet()) {
+			cancelSlotRequest(slotRequestId, cause);
+		}
+		for (SlotRequestId slotRequestId : slotRequestBulk.getFulfilledRequests().keySet()) {
+			cancelSlotRequest(slotRequestId, cause);
+		}
+	}
+
+	private Set<SlotInfo> getAllSlotInfos() {
+		return Stream
+			.concat(
+				slotPool.getAvailableSlotsInformation().stream(),
+				slotPool.getAllocatedSlotsInformation().stream())
+			.collect(Collectors.toSet());
+	}

Review comment:
       I thought to separate slot request submission (`BulkSlotProviderImpl`) and tracking (`PhysicalSlotRequestBulkTracker`). On the other hand, it is easier to test the current `PhysicalSlotRequestBulkTracker` w/o `SlotPool`. Thanks for explanation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635177408


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 4fe765587c0423b2d29d5dde946b94edff79398b (Thu May 28 07:51:24 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396) 
   * 395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435845769



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this(slotSelectionStrategy, slotPool, SystemClock.getInstance());
+	}
+
+	@VisibleForTesting
+	BulkSlotProviderImpl(
+			final SlotSelectionStrategy slotSelectionStrategy,
+			final SlotPool slotPool,
+			final Clock clock) {
+
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+		this.clock = checkNotNull(clock);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request, timeout).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest,
+			final Time timeout) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else {
+			slotFuture = slotPool.requestNewAllocatedSlotWithoutTimeout(
+				slotRequestId,
+				resourceProfile,
+				!physicalSlotRequest.willSlotBeOccupiedIndefinitely());
+		}
+
+		return slotFuture.thenApply(physicalSlot -> new PhysicalSlotRequest.Result(slotRequestId, physicalSlot));
+	}
+
+	private Optional<PhysicalSlot> tryAllocateFromAvailable(
+			final SlotRequestId slotRequestId,
+			final SlotProfile slotProfile) {
+
+		final Collection<SlotSelectionStrategy.SlotInfoAndResources> slotInfoList =
+			slotPool.getAvailableSlotsInformation()
+				.stream()
+				.map(SlotSelectionStrategy.SlotInfoAndResources::fromSingleSlot)
+				.collect(Collectors.toList());
+
+		final Optional<SlotSelectionStrategy.SlotInfoAndLocality> selectedAvailableSlot =
+			slotSelectionStrategy.selectBestSlotForProfile(slotInfoList, slotProfile);
+
+		return selectedAvailableSlot.flatMap(
+			slotInfoAndLocality -> slotPool.allocateAvailableSlot(
+				slotRequestId,
+				slotInfoAndLocality.getSlotInfo().getAllocationId())
+		);
+	}
+
+	private void schedulePendingRequestBulkTimeoutCheck(
+		final PhysicalSlotRequestBulk slotRequestBulk,
+		final Time timeout) {
+
+		componentMainThreadExecutor.schedule(() -> {
+			if (!checkPhysicalSlotRequestBulkTimeout(slotRequestBulk, timeout)) {
+				schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+			}
+		}, timeout.getSize(), timeout.getUnit());
+	}
+
+	/**
+	 * Check the slot request bulk and timeout its requests if it has been unfilfillable for too long.
+	 * @param slotRequestBulk bulk of slot requests
+	 * @param slotRequestTimeout indicates how long a pending request can be unfilfillable
+	 * @return true if the slot request bulk is done or timed out, otherwise false
+	 */
+	@VisibleForTesting
+	protected boolean checkPhysicalSlotRequestBulkTimeout(

Review comment:
       5. `BulkSlotProvider#cancelSlotRequest(...)` is added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r434964730



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotProvider.java
##########
@@ -91,6 +92,20 @@
 			allocationTimeout);
 	}
 
+	/**
+	 * Allocates a bulk of physical slots. The allocation will be completed
+	 * normally only when all the requests are fulfilled.
+	 *
+	 * @param physicalSlotRequests requests for physical slots
+	 * @param timeout indicating how long it is accepted that the slot requests can be unfulfillable
+	 * @return future of the results of slot requests
+	 */
+	default CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(

Review comment:
       I think that might not be good since we will still need to declare the default body here to avoid implement this method for all `SlotProvider` implementations .




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438606184



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(
+			this::getAllSlotInfos,
+			SystemClock.getInstance());
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public void cancelSlotRequest(SlotRequestId slotRequestId, Throwable cause) {
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		slotPool.releaseSlot(slotRequestId, cause);
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else if (physicalSlotRequest.willSlotBeOccupiedIndefinitely()) {
+			slotFuture = slotPool.requestNewAllocatedSlot(
+				slotRequestId,
+				resourceProfile,
+				null);
+		} else {
+			slotFuture = slotPool.requestNewAllocatedBatchSlot(
+				slotRequestId,
+				resourceProfile);
+		}
+
+		return slotFuture.thenApply(physicalSlot -> new PhysicalSlotRequest.Result(slotRequestId, physicalSlot));
+	}
+
+	private Optional<PhysicalSlot> tryAllocateFromAvailable(
+			final SlotRequestId slotRequestId,
+			final SlotProfile slotProfile) {
+
+		final Collection<SlotSelectionStrategy.SlotInfoAndResources> slotInfoList =
+			slotPool.getAvailableSlotsInformation()
+				.stream()
+				.map(SlotSelectionStrategy.SlotInfoAndResources::fromSingleSlot)
+				.collect(Collectors.toList());
+
+		final Optional<SlotSelectionStrategy.SlotInfoAndLocality> selectedAvailableSlot =
+			slotSelectionStrategy.selectBestSlotForProfile(slotInfoList, slotProfile);
+
+		return selectedAvailableSlot.flatMap(
+			slotInfoAndLocality -> slotPool.allocateAvailableSlot(
+				slotRequestId,
+				slotInfoAndLocality.getSlotInfo().getAllocationId())
+		);
+	}
+
+	private void schedulePendingRequestBulkTimeoutCheck(
+			final PhysicalSlotRequestBulk slotRequestBulk,
+			final Time timeout) {
+
+		componentMainThreadExecutor.schedule(() -> {
+			final PhysicalSlotRequestBulkTracker.TimeoutCheckResult result =
+				slotRequestBulkTracker.checkPhysicalSlotRequestBulkTimeout(slotRequestBulk, timeout);
+
+			switch (result) {
+				case PENDING:
+					//re-schedule the timeout check
+					schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+					break;
+				case TIMEOUT:
+					timeoutSlotRequestBulk(slotRequestBulk);
+					break;
+				default: // no action to take
+			}
+		}, timeout.getSize(), timeout.getUnit());
+	}
+
+	private void timeoutSlotRequestBulk(final PhysicalSlotRequestBulk slotRequestBulk) {
+		final Exception cause = new TimeoutException("Slot request bulk is not fulfillable!");
+		// pending requests must be canceled first otherwise they might be fulfilled by
+		// allocated slots released from this bulk
+		for (SlotRequestId slotRequestId : slotRequestBulk.getPendingRequests().keySet()) {
+			cancelSlotRequest(slotRequestId, cause);
+		}
+		for (SlotRequestId slotRequestId : slotRequestBulk.getFulfilledRequests().keySet()) {
+			cancelSlotRequest(slotRequestId, cause);
+		}
+	}
+
+	private Set<SlotInfo> getAllSlotInfos() {
+		return Stream
+			.concat(
+				slotPool.getAvailableSlotsInformation().stream(),
+				slotPool.getAllocatedSlotsInformation().stream())
+			.collect(Collectors.toSet());
+	}

Review comment:
       `PhysicalSlotRequestBulkTracker` does not contains actions on slot requests. All actions (allocating/canceling) take place in `BulkSlotProviderImpl`. I think this makes it easier to reasoning the slot status.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r437971960



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;
+
+	PhysicalSlotRequestBulkTracker(final Supplier<Set<SlotInfo>> slotsRetriever, final Clock clock) {
+		this.slotsRetriever = checkNotNull(slotsRetriever);
+		this.clock = checkNotNull(clock);
+		this.slotRequestBulks = Collections.newSetFromMap(new IdentityHashMap<>());
+	}
+
+	void track(final PhysicalSlotRequestBulk bulk) {
+		slotRequestBulks.add(bulk);
+
+		bulk.markUnfulfillable(clock.relativeTimeMillis());
+	}
+
+	void untrack(final PhysicalSlotRequestBulk bulk) {
+		slotRequestBulks.remove(bulk);
+	}
+
+	@VisibleForTesting
+	boolean isTracked(final PhysicalSlotRequestBulk bulk) {
+		return slotRequestBulks.contains(bulk);
+	}
+
+	/**
+	 * Check the slot request bulk and timeout its requests if it has been unfilfillable for too long.
+	 * @param slotRequestBulk bulk of slot requests
+	 * @param slotRequestTimeout indicates how long a pending request can be unfilfillable

Review comment:
       done.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;
+
+	PhysicalSlotRequestBulkTracker(final Supplier<Set<SlotInfo>> slotsRetriever, final Clock clock) {
+		this.slotsRetriever = checkNotNull(slotsRetriever);
+		this.clock = checkNotNull(clock);
+		this.slotRequestBulks = Collections.newSetFromMap(new IdentityHashMap<>());
+	}
+
+	void track(final PhysicalSlotRequestBulk bulk) {
+		slotRequestBulks.add(bulk);
+
+		bulk.markUnfulfillable(clock.relativeTimeMillis());
+	}
+
+	void untrack(final PhysicalSlotRequestBulk bulk) {
+		slotRequestBulks.remove(bulk);
+	}
+
+	@VisibleForTesting
+	boolean isTracked(final PhysicalSlotRequestBulk bulk) {
+		return slotRequestBulks.contains(bulk);
+	}
+
+	/**
+	 * Check the slot request bulk and timeout its requests if it has been unfilfillable for too long.

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * 1e045704e218f2cb0fb9dcaa31176c8943f468a5 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141) 
   * af7d4926ee8bb041713feeed10770147509d2db8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] azagrebin commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438647311



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImplTest.java
##########
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutorServiceAdapter;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.ExceptionUtils;
+import org.apache.flink.util.FlinkException;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.Matchers.hasSize;
+import static org.hamcrest.Matchers.instanceOf;
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.fail;
+
+/**
+ * Tests for {@link BulkSlotProviderImpl}.
+ */
+public class BulkSlotProviderImplTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(1000L);
+
+	private static ScheduledExecutorService singleThreadScheduledExecutorService;
+
+	private static ComponentMainThreadExecutor mainThreadExecutor;
+
+	private TestingSlotPoolImpl slotPool;
+
+	private BulkSlotProviderImpl bulkSlotProvider;
+
+	private TestingSlotOwner slotOwner;
+
+	private ManualClock clock;
+
+	@BeforeClass
+	public static void setupClass() {
+		singleThreadScheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
+		mainThreadExecutor = ComponentMainThreadExecutorServiceAdapter.forSingleThreadExecutor(singleThreadScheduledExecutorService);
+	}
+
+	@AfterClass
+	public static void teardownClass() {
+		if (singleThreadScheduledExecutorService != null) {
+			singleThreadScheduledExecutorService.shutdownNow();
+		}
+	}
+
+	@Before
+	public void setup() throws Exception {
+		clock = new ManualClock();
+
+		slotPool = new SlotPoolBuilder(mainThreadExecutor).build();
+
+		bulkSlotProvider = new BulkSlotProviderImpl(LocationPreferenceSlotSelectionStrategy.createDefault(), slotPool);
+		bulkSlotProvider.start(mainThreadExecutor);
+
+		slotOwner = new TestingSlotOwner();

Review comment:
       Can `slotOwner` be a local variable?

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTrackerTest.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link PhysicalSlotRequestBulkTracker}.
+ */
+public class PhysicalSlotRequestBulkTrackerTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(5000L);
+
+	private ManualClock clock = new ManualClock();

Review comment:
       ```suggestion
   	private final ManualClock clock = new ManualClock();
   ```

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTrackerTest.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link PhysicalSlotRequestBulkTracker}.
+ */
+public class PhysicalSlotRequestBulkTrackerTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(5000L);
+
+	private ManualClock clock = new ManualClock();
+
+	private PhysicalSlotRequestBulkTracker bulkTracker;
+
+	private Set<PhysicalSlot> slots;
+
+	private Supplier<Set<SlotInfo>> slotsRetriever;
+
+	@Before
+	public void setup() throws Exception {
+		slots = new HashSet<>();
+		slotsRetriever = () -> slots.stream().collect(Collectors.toSet());
+		bulkTracker = new PhysicalSlotRequestBulkTracker(slotsRetriever, clock);
+	}
+
+	@Test
+	public void testTrackBulk() {
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.emptyList());
+
+		assertThat(bulkTracker.isTracked(bulk), is(false));
+
+		bulkTracker.track(bulk);
+
+		assertThat(bulkTracker.isTracked(bulk), is(true));
+		assertThat(bulk.getUnfulfillableSince(), is(clock.relativeTimeMillis()));
+	}
+
+	@Test
+	public void testUntrackBulk() {
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.emptyList());
+
+		bulkTracker.track(bulk);
+		bulkTracker.untrack(bulk);
+
+		assertThat(bulkTracker.isTracked(bulk), is(false));
+	}
+
+	@Test
+	public void testBulkFulfilledOnCheck() {
+		final PhysicalSlotRequest request = createPhysicalSlotRequest();
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Arrays.asList(request));

Review comment:
       ```suggestion
   		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.singletonList(request));
   ```

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImplTest.java
##########
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutorServiceAdapter;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.ExceptionUtils;
+import org.apache.flink.util.FlinkException;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.Matchers.hasSize;
+import static org.hamcrest.Matchers.instanceOf;
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.fail;
+
+/**
+ * Tests for {@link BulkSlotProviderImpl}.
+ */
+public class BulkSlotProviderImplTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(1000L);
+
+	private static ScheduledExecutorService singleThreadScheduledExecutorService;
+
+	private static ComponentMainThreadExecutor mainThreadExecutor;
+
+	private TestingSlotPoolImpl slotPool;
+
+	private BulkSlotProviderImpl bulkSlotProvider;
+
+	private TestingSlotOwner slotOwner;
+
+	private ManualClock clock;
+
+	@BeforeClass
+	public static void setupClass() {
+		singleThreadScheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
+		mainThreadExecutor = ComponentMainThreadExecutorServiceAdapter.forSingleThreadExecutor(singleThreadScheduledExecutorService);
+	}
+
+	@AfterClass
+	public static void teardownClass() {
+		if (singleThreadScheduledExecutorService != null) {
+			singleThreadScheduledExecutorService.shutdownNow();
+		}
+	}
+
+	@Before
+	public void setup() throws Exception {
+		clock = new ManualClock();
+
+		slotPool = new SlotPoolBuilder(mainThreadExecutor).build();
+
+		bulkSlotProvider = new BulkSlotProviderImpl(LocationPreferenceSlotSelectionStrategy.createDefault(), slotPool);
+		bulkSlotProvider.start(mainThreadExecutor);
+
+		slotOwner = new TestingSlotOwner();
+		slotOwner.setReturnAllocatedSlotConsumer(
+			logicalSlot ->
+				slotPool.releaseSlot(
+					logicalSlot.getSlotRequestId(),
+					new FlinkException("Slot is being returned to the SlotPool.")));
+	}
+
+	@After
+	public void teardown() {
+		CompletableFuture.runAsync(() -> slotPool.close(), mainThreadExecutor).join();
+	}
+
+	@Test
+	public void testBulkSlotAllocationFulfilledWithAvailableSlots() throws Exception {
+		final PhysicalSlotRequest request1 = createPhysicalSlotRequest();
+		final PhysicalSlotRequest request2 = createPhysicalSlotRequest();
+		final List<PhysicalSlotRequest> requests = Arrays.asList(request1, request2);
+
+		addSlotToSlotPool();
+		addSlotToSlotPool();
+
+		final CompletableFuture<Collection<PhysicalSlotRequest.Result>> slotFutures = allocateSlots(requests);
+
+		final Collection<PhysicalSlotRequest.Result> results = slotFutures.get(TIMEOUT.getSize(), TIMEOUT.getUnit());
+		final Collection<SlotRequestId> resultRequestIds = results.stream()
+			.map(PhysicalSlotRequest.Result::getSlotRequestId)
+			.collect(Collectors.toList());
+
+		assertThat(resultRequestIds, containsInAnyOrder(request1.getSlotRequestId(), request2.getSlotRequestId()));
+	}
+
+	@Test
+	public void testBulkSlotAllocationFulfilledWithNewSlots() {
+		final List<PhysicalSlotRequest> requests = Arrays.asList(
+			createPhysicalSlotRequest(),
+			createPhysicalSlotRequest());
+		final CompletableFuture<Collection<PhysicalSlotRequest.Result>> slotFutures = allocateSlots(requests);
+
+		addSlotToSlotPool();
+
+		assertThat(slotFutures.isDone(), is(false));
+
+		addSlotToSlotPool();
+
+		assertThat(slotFutures.isDone(), is(true));
+		assertThat(slotFutures.isCompletedExceptionally(), is(false));
+	}
+
+	@Test
+	public void testBulkSlotAllocationTimeoutsIfUnfulfillable() {
+		final List<PhysicalSlotRequest> requests = Arrays.asList(
+			createPhysicalSlotRequest(),
+			createPhysicalSlotRequest());
+		final CompletableFuture<Collection<PhysicalSlotRequest.Result>> slotFutures = allocateSlots(requests);
+
+		addSlotToSlotPool();
+
+		assertThat(slotPool.getAllocatedSlots().listSlotInfo(), hasSize(1));
+
+		clock.advanceTime(TIMEOUT.toMilliseconds() + 1L, TimeUnit.MILLISECONDS);
+
+		try {
+			// wait util the requests timed out
+			slotFutures.get();
+			fail("Expected that the slot futures time out.");
+		} catch (Exception e) {
+			final Optional<Throwable> cause = ExceptionUtils.findThrowableWithMessage(
+				e,
+				"Slot request bulk is not fulfillable!");
+			assertThat(cause.isPresent(), is(true));
+			assertThat(cause.get(), instanceOf(TimeoutException.class));
+		}
+	}
+
+	@Test
+	public void testFailedBulkSlotAllocationReleasesAllocatedSlot() {
+		final List<PhysicalSlotRequest> requests = Arrays.asList(
+			createPhysicalSlotRequest(),
+			createPhysicalSlotRequest());
+		final CompletableFuture<Collection<PhysicalSlotRequest.Result>> slotFutures = allocateSlots(requests);
+
+		addSlotToSlotPool();
+
+		assertThat(slotPool.getAllocatedSlots().listSlotInfo(), hasSize(1));
+
+		clock.advanceTime(TIMEOUT.toMilliseconds() + 1L, TimeUnit.MILLISECONDS);
+
+		try {
+			// wait util the requests timed out
+			slotFutures.get();
+			fail("Expected that the slot futures time out.");
+		} catch (Exception e) {
+			// expected
+		}

Review comment:
       Could this be a private helper method `Exception  allocateSlotsAndWaitForTimeout()` to deduplicate with other tests?
   ```
   private Exception allocateSlotsAndWaitForTimeout() {
   		final List<PhysicalSlotRequest> requests = Arrays.asList(
   			createPhysicalSlotRequest(),
   			createPhysicalSlotRequest());
   		final CompletableFuture<Collection<PhysicalSlotRequest.Result>> slotFutures = allocateSlots(requests);
   
   		addSlotToSlotPool();
   
   		assertThat(slotPool.getAllocatedSlots().listSlotInfo(), hasSize(1));
   
   		clock.advanceTime(TIMEOUT.toMilliseconds() + 1L, TimeUnit.MILLISECONDS);
   
   		try {
   			// wait util the requests timed out
   			slotFutures.get();
   		} catch (Exception e) {
   			// expected
   			return e;
   		}
   		fail("Expected that the slot futures time out.");
   		return new Exception("Unexpected");
   	}
   ```

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTrackerTest.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link PhysicalSlotRequestBulkTracker}.
+ */
+public class PhysicalSlotRequestBulkTrackerTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(5000L);
+
+	private ManualClock clock = new ManualClock();
+
+	private PhysicalSlotRequestBulkTracker bulkTracker;
+
+	private Set<PhysicalSlot> slots;
+
+	private Supplier<Set<SlotInfo>> slotsRetriever;
+
+	@Before
+	public void setup() throws Exception {
+		slots = new HashSet<>();
+		slotsRetriever = () -> slots.stream().collect(Collectors.toSet());

Review comment:
       ```suggestion
   		slotsRetriever = () -> new HashSet<>(slots);
   ```

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTrackerTest.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link PhysicalSlotRequestBulkTracker}.
+ */
+public class PhysicalSlotRequestBulkTrackerTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(5000L);
+
+	private ManualClock clock = new ManualClock();
+
+	private PhysicalSlotRequestBulkTracker bulkTracker;
+
+	private Set<PhysicalSlot> slots;
+
+	private Supplier<Set<SlotInfo>> slotsRetriever;
+
+	@Before
+	public void setup() throws Exception {
+		slots = new HashSet<>();
+		slotsRetriever = () -> slots.stream().collect(Collectors.toSet());
+		bulkTracker = new PhysicalSlotRequestBulkTracker(slotsRetriever, clock);
+	}
+
+	@Test
+	public void testTrackBulk() {
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.emptyList());
+
+		assertThat(bulkTracker.isTracked(bulk), is(false));
+
+		bulkTracker.track(bulk);
+
+		assertThat(bulkTracker.isTracked(bulk), is(true));
+		assertThat(bulk.getUnfulfillableSince(), is(clock.relativeTimeMillis()));
+	}
+
+	@Test
+	public void testUntrackBulk() {
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.emptyList());
+
+		bulkTracker.track(bulk);
+		bulkTracker.untrack(bulk);
+
+		assertThat(bulkTracker.isTracked(bulk), is(false));
+	}
+
+	@Test
+	public void testBulkFulfilledOnCheck() {
+		final PhysicalSlotRequest request = createPhysicalSlotRequest();
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Arrays.asList(request));

Review comment:
       Also other places




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435194969



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this(slotSelectionStrategy, slotPool, SystemClock.getInstance());
+	}
+
+	@VisibleForTesting
+	BulkSlotProviderImpl(
+			final SlotSelectionStrategy slotSelectionStrategy,
+			final SlotPool slotPool,
+			final Clock clock) {
+
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+		this.clock = checkNotNull(clock);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request, timeout).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest,
+			final Time timeout) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else {
+			slotFuture = slotPool.requestNewAllocatedSlotWithoutTimeout(
+				slotRequestId,
+				resourceProfile,
+				!physicalSlotRequest.willSlotBeOccupiedIndefinitely());
+		}
+
+		return slotFuture.thenApply(physicalSlot -> new PhysicalSlotRequest.Result(slotRequestId, physicalSlot));
+	}
+
+	private Optional<PhysicalSlot> tryAllocateFromAvailable(
+			final SlotRequestId slotRequestId,
+			final SlotProfile slotProfile) {
+
+		final Collection<SlotSelectionStrategy.SlotInfoAndResources> slotInfoList =
+			slotPool.getAvailableSlotsInformation()
+				.stream()
+				.map(SlotSelectionStrategy.SlotInfoAndResources::fromSingleSlot)
+				.collect(Collectors.toList());
+
+		final Optional<SlotSelectionStrategy.SlotInfoAndLocality> selectedAvailableSlot =
+			slotSelectionStrategy.selectBestSlotForProfile(slotInfoList, slotProfile);
+
+		return selectedAvailableSlot.flatMap(
+			slotInfoAndLocality -> slotPool.allocateAvailableSlot(
+				slotRequestId,
+				slotInfoAndLocality.getSlotInfo().getAllocationId())
+		);
+	}
+
+	private void schedulePendingRequestBulkTimeoutCheck(
+		final PhysicalSlotRequestBulk slotRequestBulk,
+		final Time timeout) {
+
+		componentMainThreadExecutor.schedule(() -> {
+			if (!checkPhysicalSlotRequestBulkTimeout(slotRequestBulk, timeout)) {
+				schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+			}
+		}, timeout.getSize(), timeout.getUnit());
+	}
+
+	/**
+	 * Check the slot request bulk and timeout its requests if it has been unfilfillable for too long.
+	 * @param slotRequestBulk bulk of slot requests
+	 * @param slotRequestTimeout indicates how long a pending request can be unfilfillable
+	 * @return true if the slot request bulk is done or timed out, otherwise false
+	 */
+	@VisibleForTesting
+	protected boolean checkPhysicalSlotRequestBulkTimeout(

Review comment:
       Sounds good to me. Let me take another look.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r437981735



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(
+			this::getAllSlotInfos,
+			SystemClock.getInstance());
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public void cancelSlotRequest(SlotRequestId slotRequestId, Throwable cause) {
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		slotPool.releaseSlot(slotRequestId, cause);
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else if (physicalSlotRequest.willSlotBeOccupiedIndefinitely()) {
+			slotFuture = slotPool.requestNewAllocatedSlot(
+				slotRequestId,
+				resourceProfile,
+				null);
+		} else {
+			slotFuture = slotPool.requestNewAllocatedBatchSlot(
+				slotRequestId,
+				resourceProfile);
+		}
+
+		return slotFuture.thenApply(physicalSlot -> new PhysicalSlotRequest.Result(slotRequestId, physicalSlot));
+	}
+
+	private Optional<PhysicalSlot> tryAllocateFromAvailable(
+			final SlotRequestId slotRequestId,
+			final SlotProfile slotProfile) {
+
+		final Collection<SlotSelectionStrategy.SlotInfoAndResources> slotInfoList =
+			slotPool.getAvailableSlotsInformation()
+				.stream()
+				.map(SlotSelectionStrategy.SlotInfoAndResources::fromSingleSlot)
+				.collect(Collectors.toList());
+
+		final Optional<SlotSelectionStrategy.SlotInfoAndLocality> selectedAvailableSlot =
+			slotSelectionStrategy.selectBestSlotForProfile(slotInfoList, slotProfile);
+
+		return selectedAvailableSlot.flatMap(
+			slotInfoAndLocality -> slotPool.allocateAvailableSlot(
+				slotRequestId,
+				slotInfoAndLocality.getSlotInfo().getAllocationId())
+		);
+	}
+
+	private void schedulePendingRequestBulkTimeoutCheck(
+			final PhysicalSlotRequestBulk slotRequestBulk,
+			final Time timeout) {
+
+		componentMainThreadExecutor.schedule(() -> {
+			final PhysicalSlotRequestBulkTracker.TimeoutCheckResult result =
+				slotRequestBulkTracker.checkPhysicalSlotRequestBulkTimeout(slotRequestBulk, timeout);
+
+			switch (result) {
+				case PENDING:
+					//re-schedule the timeout check
+					schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+					break;
+				case TIMEOUT:
+					timeoutSlotRequestBulk(slotRequestBulk);
+					break;
+				default: // no action to take
+			}
+		}, timeout.getSize(), timeout.getUnit());
+	}
+
+	private void timeoutSlotRequestBulk(final PhysicalSlotRequestBulk slotRequestBulk) {
+		final Exception cause = new TimeoutException("Slot request bulk is not fulfillable!");
+		// pending requests must be canceled first otherwise they might be fulfilled by
+		// allocated slots released from this bulk
+		for (SlotRequestId slotRequestId : slotRequestBulk.getPendingRequests().keySet()) {
+			cancelSlotRequest(slotRequestId, cause);
+		}
+		for (SlotRequestId slotRequestId : slotRequestBulk.getFulfilledRequests().keySet()) {
+			cancelSlotRequest(slotRequestId, cause);
+		}
+	}
+
+	private Set<SlotInfo> getAllSlotInfos() {
+		return Stream
+			.concat(
+				slotPool.getAvailableSlotsInformation().stream(),
+				slotPool.getAllocatedSlotsInformation().stream())
+			.collect(Collectors.toSet());
+	}

Review comment:
       Sorry I do not quite understand what do you mean by 'simplify multithreading'?
   
   Currently we are a `slotsRetriever` to `PhysicalSlotRequestBulkTracker`. It simplifies testing and hides actions methods of `SlotPool` that a `PhysicalSlotRequestBulkTracker` should not know.
   `SchedulerImpl` invokes `checkPhysicalSlotRequestBulkTimeout` and I think it's better to let it decide how to deal with the check results. That's why I put `timeoutSlotRequestBulk` in `SchedulerImpl`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r437972735



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolImpl.java
##########
@@ -447,12 +452,40 @@ public void releaseSlot(@Nonnull SlotRequestId slotRequestId, @Nullable Throwabl
 
 		componentMainThreadExecutor.assertRunningInMainThread();
 
+		checkState(batchSlotTimeoutCheckEnabled, "batch slot timeout check is disabled unexpectedly.");
+
 		final PendingRequest pendingRequest = PendingRequest.createBatchRequest(slotRequestId, resourceProfile);
 
 		return requestNewAllocatedSlotInternal(pendingRequest)
 			.thenApply(Function.identity());
 	}
 
+	@Override
+	public CompletableFuture<PhysicalSlot> requestNewAllocatedSlotWithoutTimeout(

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438653169



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;

Review comment:
       `PhysicalSlotRequestBulkChecker` sounds good to me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r434965538



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulk.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+/**
+ * Represents a bulk of physical slot requests.
+ */
+public class PhysicalSlotRequestBulk {
+
+	private final Map<SlotRequestId, ResourceProfile> pendingRequests;
+
+	private final Map<SlotRequestId, AllocationID> fulfilledRequests = new HashMap<>();
+
+	PhysicalSlotRequestBulk(final Collection<PhysicalSlotRequest> physicalSlotRequests) {
+		this.pendingRequests = physicalSlotRequests.stream()
+			.collect(Collectors.toMap(
+				PhysicalSlotRequest::getSlotRequestId,
+				r -> r.getSlotProfile().getPhysicalSlotResourceProfile()));
+	}
+
+	void markRequestFulfilled(final SlotRequestId slotRequestId, final AllocationID allocationID) {

Review comment:
       I think `markRequestFulfilled` is more accurate since this method does not fulfill requests. It is more alike a callback on slot request fulfilled.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r434987385



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this(slotSelectionStrategy, slotPool, SystemClock.getInstance());
+	}
+
+	@VisibleForTesting
+	BulkSlotProviderImpl(
+			final SlotSelectionStrategy slotSelectionStrategy,
+			final SlotPool slotPool,
+			final Clock clock) {
+
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+		this.clock = checkNotNull(clock);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request, timeout).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest,
+			final Time timeout) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else {
+			slotFuture = slotPool.requestNewAllocatedSlotWithoutTimeout(

Review comment:
       I'm thinking whether we should treat slot pool more a like a pool, and thus to split concerns of bulk allocation failures and external request failures.
   1. bulk allocation fails because it is not fulfillable, we can print debug level logs of desc of slot requests  and state of available/allocated slots in the slot pool.
   2. external request failures should be exposed in logs to demonstrate that Flink fails to extend the pool, which can be the deep reason that why a bulk of slot requests fails. However, external sot request failures does not mean that the internal slot request is definitely to fail.
   
   I'm even thinking to enable auto-retry for failed external sot requests rather than fail the initiating internal slot request to make slot pool more a like a pool. (related to this [comment](https://github.com/apache/flink/pull/12278#discussion_r433607269))




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154",
       "triggerID" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219",
       "triggerID" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * a9a9a40e5fedb82c5602b84f59894d8e0ee5b679 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219) 
   * d3f81cb67e5924bfc9d3e7bbba25af281e6b3102 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154",
       "triggerID" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * af7d4926ee8bb041713feeed10770147509d2db8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420) 
   * 40df0ed6d44b0e9cd11faa7707490045b48151b0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154",
       "triggerID" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219",
       "triggerID" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3230",
       "triggerID" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * a9a9a40e5fedb82c5602b84f59894d8e0ee5b679 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219) 
   * d3f81cb67e5924bfc9d3e7bbba25af281e6b3102 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3230) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435199481



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolImpl.java
##########
@@ -447,12 +452,40 @@ public void releaseSlot(@Nonnull SlotRequestId slotRequestId, @Nullable Throwabl
 
 		componentMainThreadExecutor.assertRunningInMainThread();
 
+		checkState(batchSlotTimeoutCheckEnabled, "batch slot timeout check is disabled unexpectedly.");
+
 		final PendingRequest pendingRequest = PendingRequest.createBatchRequest(slotRequestId, resourceProfile);
 
 		return requestNewAllocatedSlotInternal(pendingRequest)
 			.thenApply(Function.identity());
 	}
 
+	@Override
+	public CompletableFuture<PhysicalSlot> requestNewAllocatedSlotWithoutTimeout(

Review comment:
       done via 9b5ca3d921855ddc3557d72202526cec6c474be8 and 54bd4613db64de1ff5b7fcdf29931a54895c3163




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420) 
   * 40df0ed6d44b0e9cd11faa7707490045b48151b0 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435845301



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this(slotSelectionStrategy, slotPool, SystemClock.getInstance());
+	}
+
+	@VisibleForTesting
+	BulkSlotProviderImpl(
+			final SlotSelectionStrategy slotSelectionStrategy,
+			final SlotPool slotPool,
+			final Clock clock) {
+
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+		this.clock = checkNotNull(clock);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request, timeout).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest,
+			final Time timeout) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else {
+			slotFuture = slotPool.requestNewAllocatedSlotWithoutTimeout(
+				slotRequestId,
+				resourceProfile,
+				!physicalSlotRequest.willSlotBeOccupiedIndefinitely());
+		}
+
+		return slotFuture.thenApply(physicalSlot -> new PhysicalSlotRequest.Result(slotRequestId, physicalSlot));
+	}
+
+	private Optional<PhysicalSlot> tryAllocateFromAvailable(
+			final SlotRequestId slotRequestId,
+			final SlotProfile slotProfile) {
+
+		final Collection<SlotSelectionStrategy.SlotInfoAndResources> slotInfoList =
+			slotPool.getAvailableSlotsInformation()
+				.stream()
+				.map(SlotSelectionStrategy.SlotInfoAndResources::fromSingleSlot)
+				.collect(Collectors.toList());
+
+		final Optional<SlotSelectionStrategy.SlotInfoAndLocality> selectedAvailableSlot =
+			slotSelectionStrategy.selectBestSlotForProfile(slotInfoList, slotProfile);
+
+		return selectedAvailableSlot.flatMap(
+			slotInfoAndLocality -> slotPool.allocateAvailableSlot(
+				slotRequestId,
+				slotInfoAndLocality.getSlotInfo().getAllocationId())
+		);
+	}
+
+	private void schedulePendingRequestBulkTimeoutCheck(
+		final PhysicalSlotRequestBulk slotRequestBulk,
+		final Time timeout) {
+
+		componentMainThreadExecutor.schedule(() -> {
+			if (!checkPhysicalSlotRequestBulkTimeout(slotRequestBulk, timeout)) {
+				schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+			}
+		}, timeout.getSize(), timeout.getUnit());
+	}
+
+	/**
+	 * Check the slot request bulk and timeout its requests if it has been unfilfillable for too long.
+	 * @param slotRequestBulk bulk of slot requests
+	 * @param slotRequestTimeout indicates how long a pending request can be unfilfillable
+	 * @return true if the slot request bulk is done or timed out, otherwise false
+	 */
+	@VisibleForTesting
+	protected boolean checkPhysicalSlotRequestBulkTimeout(

Review comment:
       done in a59c25e8ef5dab1571564424a821e1ea9dcd50be. 
   main changes are:
   1. `unfulfillableSinceTimestamp` and related methods are moved into `PhysicalSlotRequestBulk`. Related tests moved to `PhysicalSlotRequestBulkTest`.
   2. bulk timeout check methods are moved into `PhysicalSlotRequestBulkTracker`. Tests are reworked.
   3. kept tests for `PhysicalSlotRequestBulkTracker#isSlotRequestBulkFulfillable` but made this method static
   4. `PhysicalSlotRequestBulkTracker#checkPhysicalSlotRequestBulkTimeout` is reworked to return FULFILLED/PENDING/TIMEOUT. The slot releasing on TIMEOUT will invoked in `schedulePendingRequestBulkTimeoutCheck`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r434987385



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this(slotSelectionStrategy, slotPool, SystemClock.getInstance());
+	}
+
+	@VisibleForTesting
+	BulkSlotProviderImpl(
+			final SlotSelectionStrategy slotSelectionStrategy,
+			final SlotPool slotPool,
+			final Clock clock) {
+
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+		this.clock = checkNotNull(clock);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request, timeout).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest,
+			final Time timeout) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else {
+			slotFuture = slotPool.requestNewAllocatedSlotWithoutTimeout(

Review comment:
       I'm thinking whether we should treat slot pool more a like a pool, and thus to split concerns of bulk allocation failures and external request failures.
   1. bulk allocation fails because it is not fulfillable, we can print debug level logs of desc of slot requests  and state of available/allocated slots in the slot pool.
   2. external request failures should to exposed in logs to demonstrate that we fails to extend the pool, which can be the deep reason that why a bulk of slot requests fails. However, external sot request failures does not mean that the internal slot request is definitely to fail.
   
   I'm even thinking to enable auto-retry for failed external sot requests rather than fail the initiating internal slot request to make slot pool more a like a pool. (related to this [comment](https://github.com/apache/flink/pull/12278#discussion_r433607269))




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154",
       "triggerID" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219",
       "triggerID" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3230",
       "triggerID" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f9ea7bdfe11e677c214913340ef804e3e28c25aa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f9ea7bdfe11e677c214913340ef804e3e28c25aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bdf970c21e07f96589b85949ea7b697ac5c64c36",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3274",
       "triggerID" : "bdf970c21e07f96589b85949ea7b697ac5c64c36",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * d3f81cb67e5924bfc9d3e7bbba25af281e6b3102 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3230) 
   * f9ea7bdfe11e677c214913340ef804e3e28c25aa UNKNOWN
   * bdf970c21e07f96589b85949ea7b697ac5c64c36 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3274) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4fe765587c0423b2d29d5dde946b94edff79398b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346) 
   * c0483c6347992b8c4412da489b5584879834c396 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435840523



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotProvider.java
##########
@@ -91,6 +92,20 @@
 			allocationTimeout);
 	}
 
+	/**
+	 * Allocates a bulk of physical slots. The allocation will be completed
+	 * normally only when all the requests are fulfilled.
+	 *
+	 * @param physicalSlotRequests requests for physical slots
+	 * @param timeout indicating how long it is accepted that the slot requests can be unfulfillable
+	 * @return future of the results of slot requests
+	 */
+	default CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(

Review comment:
       Ok. It would be good to limit `OneSlotPerExecutionSlotAllocator` to only use methods of the new interface.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotProvider.java
##########
@@ -91,6 +92,20 @@
 			allocationTimeout);
 	}
 
+	/**
+	 * Allocates a bulk of physical slots. The allocation will be completed
+	 * normally only when all the requests are fulfilled.
+	 *
+	 * @param physicalSlotRequests requests for physical slots
+	 * @param timeout indicating how long it is accepted that the slot requests can be unfulfillable
+	 * @return future of the results of slot requests
+	 */
+	default CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r434966411



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {

Review comment:
       I think it's not needed since this is the only valid entry for production. The other construtor is for testing purpose only.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438015223



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;

Review comment:
       Yes we can remove them. I had wanted to keep the bulks tracked for sanity check and diagnostic but of course it is not necessary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk merged pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk merged pull request #12375:
URL: https://github.com/apache/flink/pull/12375


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r437993765



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;
+
+	PhysicalSlotRequestBulkTracker(final Supplier<Set<SlotInfo>> slotsRetriever, final Clock clock) {
+		this.slotsRetriever = checkNotNull(slotsRetriever);
+		this.clock = checkNotNull(clock);
+		this.slotRequestBulks = Collections.newSetFromMap(new IdentityHashMap<>());
+	}
+
+	void track(final PhysicalSlotRequestBulk bulk) {
+		slotRequestBulks.add(bulk);
+
+		bulk.markUnfulfillable(clock.relativeTimeMillis());
+	}
+
+	void untrack(final PhysicalSlotRequestBulk bulk) {
+		slotRequestBulks.remove(bulk);
+	}
+
+	@VisibleForTesting
+	boolean isTracked(final PhysicalSlotRequestBulk bulk) {
+		return slotRequestBulks.contains(bulk);
+	}
+
+	/**
+	 * Check the slot request bulk and timeout its requests if it has been unfilfillable for too long.
+	 * @param slotRequestBulk bulk of slot requests
+	 * @param slotRequestTimeout indicates how long a pending request can be unfilfillable
+	 * @return result of the check, indicating the bulk is fulfilled, still pending, or timed out
+	 */
+	TimeoutCheckResult checkPhysicalSlotRequestBulkTimeout(
+			final PhysicalSlotRequestBulk slotRequestBulk,
+			final Time slotRequestTimeout) {
+
+		checkState(isTracked(slotRequestBulk));
+
+		if (slotRequestBulk.getPendingRequests().isEmpty()) {
+			return TimeoutCheckResult.FULFILLED;
+		}
+
+		final boolean fulfillable = isSlotRequestBulkFulfillable(slotRequestBulk, slotsRetriever);
+		if (fulfillable) {
+			slotRequestBulk.markFulfillable();
+		} else {
+			final long currentTimestamp = clock.relativeTimeMillis();
+
+			slotRequestBulk.markUnfulfillable(currentTimestamp);
+
+			final long unfulfillableSince = slotRequestBulk.getUnfulfillableSince();
+			if (unfulfillableSince + slotRequestTimeout.toMilliseconds() <= currentTimestamp) {
+				return TimeoutCheckResult.TIMEOUT;
+			}
+		}
+
+		return TimeoutCheckResult.PENDING;
+	}
+
+	/**
+	 * Returns whether the given bulk of slot requests are possible to be fulfilled at the same time
+	 * with all the reusable slots in the slot pool. A reusable slot means the slot is available or
+	 * will not be occupied indefinitely.
+	 *
+	 * @param slotRequestBulk bulk of slot requests to check
+	 * @param slotsRetriever supplies slots to be used for the fulfill-ability check
+	 * @return true if the slot requests are possible to be fulfilled, otherwise false
+	 */
+	@VisibleForTesting
+	static boolean isSlotRequestBulkFulfillable(

Review comment:
       Yes it's possible to test `isSlotRequestBulkFulfillable` with `checkPhysicalSlotRequestBulkTimeout` but it will complicate the tests for understanding and maintenance.
   `isSlotRequestBulkFulfillable` is a static utility method so I think it's fine to test it separately.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154",
       "triggerID" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * af7d4926ee8bb041713feeed10770147509d2db8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154) 
   * a9a9a40e5fedb82c5602b84f59894d8e0ee5b679 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438603956



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;

Review comment:
       What do you think of rename `PhysicalSlotRequestBulkTracker` to `PhysicalSlotRequestBulkHelper` since it does not track the bulks but provides several methods which helps with bulk creation, initialization and checking?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 40df0ed6d44b0e9cd11faa7707490045b48151b0 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c0483c6347992b8c4412da489b5584879834c396 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350) 
   * a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] azagrebin commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r431914289



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulk.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+/**
+ * Represents a bulk of physical slot requests.
+ */
+public class PhysicalSlotRequestBulk {

Review comment:
       ```suggestion
   class PhysicalSlotRequestBulk {
   ```

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotAllocationTest.java
##########
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutorServiceAdapter;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.LogicalSlot;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.ExceptionUtils;
+import org.apache.flink.util.FlinkException;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.runtime.jobmaster.slotpool.AllocatedSlotOccupationTest.allocateSingleLogicalSlotFromPhysicalSlot;
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.Matchers.hasSize;
+import static org.hamcrest.Matchers.instanceOf;
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.fail;
+
+/**
+ * Tests whether bulk slot allocation works correctly.
+ */
+public class BulkSlotAllocationTest extends TestLogger {

Review comment:
       ```suggestion
   public class BulkSlotProviderImplTest extends TestLogger {
   ```

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/AllocatedSlotOccupationTest.java
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.LogicalSlot;
+import org.apache.flink.runtime.jobmaster.SlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests whether the slot occupation state of {@link AllocatedSlot} is correctly.
+ */
+public class AllocatedSlotOccupationTest extends TestLogger {
+
+	@Test
+	public void testSingleTaskOccupyingSlotIndefinitely() {
+		final PhysicalSlot physicalSlot = createPhysicalSlot();
+		allocateSingleLogicalSlotFromPhysicalSlot(physicalSlot, true);
+
+		assertTrue(physicalSlot.willBeOccupiedIndefinitely());
+	}
+
+	@Test
+	public void testSingleTaskNotOccupyingSlotIndefinitely() {
+		final PhysicalSlot physicalSlot = createPhysicalSlot();
+		allocateSingleLogicalSlotFromPhysicalSlot(physicalSlot, false);
+
+		assertFalse(physicalSlot.willBeOccupiedIndefinitely());
+	}
+
+	private static PhysicalSlot createPhysicalSlot() {
+		return new AllocatedSlot(
+			new AllocationID(),
+			new LocalTaskManagerLocation(),
+			0,
+			ResourceProfile.ANY,
+			new SimpleAckingTaskManagerGateway());
+	}
+
+	static LogicalSlot allocateSingleLogicalSlotFromPhysicalSlot(
+			final PhysicalSlot physicalSlot,
+			final boolean slotWillBeOccupiedIndefinitely) {
+
+		return allocateSingleLogicalSlotFromPhysicalSlot(
+			new SlotRequestId(),
+			physicalSlot,
+			new TestingSlotOwner(),
+			slotWillBeOccupiedIndefinitely);
+	}
+
+	static LogicalSlot allocateSingleLogicalSlotFromPhysicalSlot(

Review comment:
       nit: We have this logic in some places in production as well.
   Maybe, this could factored out into `SingleLogicalSlot#allocateFromPhysicalSlot`?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolImpl.java
##########
@@ -447,12 +452,40 @@ public void releaseSlot(@Nonnull SlotRequestId slotRequestId, @Nullable Throwabl
 
 		componentMainThreadExecutor.assertRunningInMainThread();
 
+		checkState(batchSlotTimeoutCheckEnabled, "batch slot timeout check is disabled unexpectedly.");
+
 		final PendingRequest pendingRequest = PendingRequest.createBatchRequest(slotRequestId, resourceProfile);
 
 		return requestNewAllocatedSlotInternal(pendingRequest)
 			.thenApply(Function.identity());
 	}
 
+	@Override
+	public CompletableFuture<PhysicalSlot> requestNewAllocatedSlotWithoutTimeout(

Review comment:
       I am wondering whether it would be more clear just to add  `disableBatchSlotTimeoutCheck` explicitly to the public interface.
   We could also make `timeout` `Nullable` in `requestNewAllocatedSlot` then `FutureUtils.orTimeout` can be conditional in `requestNewAllocatedSlot`.
   The we do not need `requestNewAllocatedSlotWithoutTimeout` and make any assumptions in it about `batchSlotTimeoutCheckEnabled`.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotProvider.java
##########
@@ -91,6 +92,20 @@
 			allocationTimeout);
 	}
 
+	/**
+	 * Allocates a bulk of physical slots. The allocation will be completed
+	 * normally only when all the requests are fulfilled.
+	 *
+	 * @param physicalSlotRequests requests for physical slots
+	 * @param timeout indicating how long it is accepted that the slot requests can be unfulfillable
+	 * @return future of the results of slot requests
+	 */
+	default CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(

Review comment:
       Maybe, `SlotProvider` could just extend `BulkSlotProvider`?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {

Review comment:
       What do you think about `BulkSlotProviderImpl#createWithSystemClock()`?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this(slotSelectionStrategy, slotPool, SystemClock.getInstance());
+	}
+
+	@VisibleForTesting
+	BulkSlotProviderImpl(
+			final SlotSelectionStrategy slotSelectionStrategy,
+			final SlotPool slotPool,
+			final Clock clock) {
+
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+		this.clock = checkNotNull(clock);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request, timeout).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest,
+			final Time timeout) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else {
+			slotFuture = slotPool.requestNewAllocatedSlotWithoutTimeout(

Review comment:
       maybe no immediate action now, but I find it quite hard to judge about failures in SlotPoolImpl , especially for bulk allocation.
   With this new concept of fulfil-ability, maybe we should simplify SlotPoolImpl and let BulkSlotProviderImpl judge about how to interpret external request failures and when to fulfil the requests, it could be a follow-up.
   We might want to note that new slot allocation has failed but there are often chances that the pending request can be fulfilled by allocated slots.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this(slotSelectionStrategy, slotPool, SystemClock.getInstance());
+	}
+
+	@VisibleForTesting
+	BulkSlotProviderImpl(
+			final SlotSelectionStrategy slotSelectionStrategy,
+			final SlotPool slotPool,
+			final Clock clock) {
+
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+		this.clock = checkNotNull(clock);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request, timeout).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest,
+			final Time timeout) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else {
+			slotFuture = slotPool.requestNewAllocatedSlotWithoutTimeout(
+				slotRequestId,
+				resourceProfile,
+				!physicalSlotRequest.willSlotBeOccupiedIndefinitely());
+		}
+
+		return slotFuture.thenApply(physicalSlot -> new PhysicalSlotRequest.Result(slotRequestId, physicalSlot));
+	}
+
+	private Optional<PhysicalSlot> tryAllocateFromAvailable(
+			final SlotRequestId slotRequestId,
+			final SlotProfile slotProfile) {
+
+		final Collection<SlotSelectionStrategy.SlotInfoAndResources> slotInfoList =
+			slotPool.getAvailableSlotsInformation()
+				.stream()
+				.map(SlotSelectionStrategy.SlotInfoAndResources::fromSingleSlot)
+				.collect(Collectors.toList());
+
+		final Optional<SlotSelectionStrategy.SlotInfoAndLocality> selectedAvailableSlot =
+			slotSelectionStrategy.selectBestSlotForProfile(slotInfoList, slotProfile);
+
+		return selectedAvailableSlot.flatMap(
+			slotInfoAndLocality -> slotPool.allocateAvailableSlot(
+				slotRequestId,
+				slotInfoAndLocality.getSlotInfo().getAllocationId())
+		);
+	}
+
+	private void schedulePendingRequestBulkTimeoutCheck(
+		final PhysicalSlotRequestBulk slotRequestBulk,
+		final Time timeout) {
+
+		componentMainThreadExecutor.schedule(() -> {
+			if (!checkPhysicalSlotRequestBulkTimeout(slotRequestBulk, timeout)) {
+				schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+			}
+		}, timeout.getSize(), timeout.getUnit());
+	}
+
+	/**
+	 * Check the slot request bulk and timeout its requests if it has been unfilfillable for too long.
+	 * @param slotRequestBulk bulk of slot requests
+	 * @param slotRequestTimeout indicates how long a pending request can be unfilfillable
+	 * @return true if the slot request bulk is done or timed out, otherwise false
+	 */
+	@VisibleForTesting
+	protected boolean checkPhysicalSlotRequestBulkTimeout(

Review comment:
       Generally, making methods `VisibleForTesting` looks like we have one more concern to test internally.
   Could we factor out this concern to test separately?
   
   What if we create a `PhysicalSlotRequestBulkTracker(requests, slotPool)` for each bulk by merging it with `PhysicalSlotRequestBulk`?
   or `PhysicalSlotRequestBulk` could have the `unfulfillableSinceTimestamp` internally.
   Not sure, why we need the existing `PhysicalSlotRequestBulkTracker` for _all_ bulk timestamps.
   
   All methods here, related to timeout/fulfilablity check, could also go into the `PhysicalSlotRequestBulkTracker`.
   `BulkSlotProviderImpl` could schedule `PhysicalSlotRequestBulkTracker#checkPhysicalSlotRequestBulkTimeout` for each bulk.
   `PhysicalSlotRequestBulkTracker#checkPhysicalSlotRequestBulkTimeout` could return FINISHED/FULFILLABLE/TIMEOUT.
   Then `PhysicalSlotRequestBulkTracker#checkPhysicalSlotRequestBulkTimeout` could be tested separately.

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/AllocatedSlotOccupationTest.java
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.LogicalSlot;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests whether the slot occupation state of {@link AllocatedSlot} is correctly.
+ */
+public class AllocatedSlotOccupationTest extends TestLogger {
+
+	@Test
+	public void testSingleTaskOccupyingSlotIndefinitely() {
+		final PhysicalSlot physicalSlot = createPhysicalSlot();
+		allocateSingleLogicalSlotFromPhysicalSlot(physicalSlot, true);
+
+		assertTrue(physicalSlot.willBeOccupiedIndefinitely());

Review comment:
       ```suggestion
   		assertThat(physicalSlot.willBeOccupiedIndefinitely(), is(true));
   ```
   also in other places

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTrackerTest.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Collections;
+import java.util.concurrent.TimeUnit;
+
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link PhysicalSlotRequestBulkTracker}.
+ */
+public class PhysicalSlotRequestBulkTrackerTest extends TestLogger {
+
+	private ManualClock clock = new ManualClock();
+
+	private PhysicalSlotRequestBulkTracker bulkTracker;
+
+	@Before
+	public void setup() throws Exception {
+		bulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+	}
+
+	@Test
+	public void testBulkTracking() {
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.emptyList());
+
+		assertThat(bulkTracker.isTracked(bulk), is(false));
+
+		bulkTracker.track(bulk);
+
+		assertThat(bulkTracker.isTracked(bulk), is(true));
+		assertThat(bulkTracker.getBulkUnfulfillableSince(bulk), is(clock.relativeTimeMillis()));
+
+		bulkTracker.untrack(bulk);
+
+		assertThat(bulkTracker.isTracked(bulk), is(false));
+	}
+
+	@Test
+	public void testUnfulfillableTimestampWillNotBeOverriddenByFollowingUnfulfillableTimestamp() {
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.emptyList());
+		bulkTracker.track(bulk);
+
+		final long unfulfillableSince = clock.relativeTimeMillis();
+
+		assertThat(bulkTracker.getBulkUnfulfillableSince(bulk), is(unfulfillableSince));
+
+		clock.advanceTime(456, TimeUnit.MILLISECONDS);
+		bulkTracker.markBulkUnfulfillable(bulk, clock.relativeTimeMillis());
+
+		assertThat(bulkTracker.getBulkUnfulfillableSince(bulk), is(unfulfillableSince));
+
+		bulkTracker.markBulkFulfillable(bulk);
+		bulkTracker.markBulkUnfulfillable(bulk, clock.relativeTimeMillis());
+
+		assertThat(bulkTracker.getBulkUnfulfillableSince(bulk), is(clock.relativeTimeMillis()));

Review comment:
       can this be a separate test?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulk.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+/**
+ * Represents a bulk of physical slot requests.
+ */
+public class PhysicalSlotRequestBulk {
+
+	private final Map<SlotRequestId, ResourceProfile> pendingRequests;
+
+	private final Map<SlotRequestId, AllocationID> fulfilledRequests = new HashMap<>();
+
+	PhysicalSlotRequestBulk(final Collection<PhysicalSlotRequest> physicalSlotRequests) {
+		this.pendingRequests = physicalSlotRequests.stream()
+			.collect(Collectors.toMap(
+				PhysicalSlotRequest::getSlotRequestId,
+				r -> r.getSlotProfile().getPhysicalSlotResourceProfile()));
+	}
+
+	void markRequestFulfilled(final SlotRequestId slotRequestId, final AllocationID allocationID) {

Review comment:
       ```suggestion
   	void fulfillPendingSlotRequest(final SlotRequestId slotRequestId, final AllocationID allocationID) {
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolImpl.java
##########
@@ -447,12 +452,40 @@ public void releaseSlot(@Nonnull SlotRequestId slotRequestId, @Nullable Throwabl
 
 		componentMainThreadExecutor.assertRunningInMainThread();
 
+		checkState(batchSlotTimeoutCheckEnabled, "batch slot timeout check is disabled unexpectedly.");
+
 		final PendingRequest pendingRequest = PendingRequest.createBatchRequest(slotRequestId, resourceProfile);
 
 		return requestNewAllocatedSlotInternal(pendingRequest)
 			.thenApply(Function.identity());
 	}
 
+	@Override
+	public CompletableFuture<PhysicalSlot> requestNewAllocatedSlotWithoutTimeout(
+			final SlotRequestId slotRequestId,
+			final ResourceProfile resourceProfile,
+			final boolean isBatchRequest) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		// a slot pool can serve either this kind of request or requestNewAllocatedBatchSlot(...), never both
+		disableBatchSlotTimeoutCheck();

Review comment:
       ```suggestion
   		batchSlotTimeoutCheckEnabled = false;
   ```
   do we need a method for this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] azagrebin commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438636756



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(
+			this::getAllSlotInfos,
+			SystemClock.getInstance());
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public void cancelSlotRequest(SlotRequestId slotRequestId, Throwable cause) {
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		slotPool.releaseSlot(slotRequestId, cause);
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else if (physicalSlotRequest.willSlotBeOccupiedIndefinitely()) {
+			slotFuture = slotPool.requestNewAllocatedSlot(
+				slotRequestId,
+				resourceProfile,
+				null);
+		} else {
+			slotFuture = slotPool.requestNewAllocatedBatchSlot(
+				slotRequestId,
+				resourceProfile);
+		}

Review comment:
       I meant only a private helper method in `BulkSlotProviderImpl` :)
   it is a nit anyways




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438660533



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImplTest.java
##########
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutorServiceAdapter;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.ExceptionUtils;
+import org.apache.flink.util.FlinkException;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.Matchers.hasSize;
+import static org.hamcrest.Matchers.instanceOf;
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.fail;
+
+/**
+ * Tests for {@link BulkSlotProviderImpl}.
+ */
+public class BulkSlotProviderImplTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(1000L);
+
+	private static ScheduledExecutorService singleThreadScheduledExecutorService;
+
+	private static ComponentMainThreadExecutor mainThreadExecutor;
+
+	private TestingSlotPoolImpl slotPool;
+
+	private BulkSlotProviderImpl bulkSlotProvider;
+
+	private TestingSlotOwner slotOwner;
+
+	private ManualClock clock;
+
+	@BeforeClass
+	public static void setupClass() {
+		singleThreadScheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
+		mainThreadExecutor = ComponentMainThreadExecutorServiceAdapter.forSingleThreadExecutor(singleThreadScheduledExecutorService);
+	}
+
+	@AfterClass
+	public static void teardownClass() {
+		if (singleThreadScheduledExecutorService != null) {
+			singleThreadScheduledExecutorService.shutdownNow();
+		}
+	}
+
+	@Before
+	public void setup() throws Exception {
+		clock = new ManualClock();
+
+		slotPool = new SlotPoolBuilder(mainThreadExecutor).build();
+
+		bulkSlotProvider = new BulkSlotProviderImpl(LocationPreferenceSlotSelectionStrategy.createDefault(), slotPool);
+		bulkSlotProvider.start(mainThreadExecutor);
+
+		slotOwner = new TestingSlotOwner();

Review comment:
       It can be removed.
   It was for creating logical slot to occupy a physical slot, but it is not needed anymore since we refactored the bulk check logic into the checker class.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 35ef82ee0bf1d008719a52e8182f9b63f3eddb11 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729) 
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * ff97aeedb303fe540ebb3728ed280de414606978 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] azagrebin commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435880917



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this(slotSelectionStrategy, slotPool, SystemClock.getInstance());
+	}
+
+	@VisibleForTesting
+	BulkSlotProviderImpl(
+			final SlotSelectionStrategy slotSelectionStrategy,
+			final SlotPool slotPool,
+			final Clock clock) {
+
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+		this.clock = checkNotNull(clock);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request, timeout).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest,
+			final Time timeout) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else {
+			slotFuture = slotPool.requestNewAllocatedSlotWithoutTimeout(
+				slotRequestId,
+				resourceProfile,
+				!physicalSlotRequest.willSlotBeOccupiedIndefinitely());
+		}
+
+		return slotFuture.thenApply(physicalSlot -> new PhysicalSlotRequest.Result(slotRequestId, physicalSlot));
+	}
+
+	private Optional<PhysicalSlot> tryAllocateFromAvailable(
+			final SlotRequestId slotRequestId,
+			final SlotProfile slotProfile) {
+
+		final Collection<SlotSelectionStrategy.SlotInfoAndResources> slotInfoList =
+			slotPool.getAvailableSlotsInformation()
+				.stream()
+				.map(SlotSelectionStrategy.SlotInfoAndResources::fromSingleSlot)
+				.collect(Collectors.toList());
+
+		final Optional<SlotSelectionStrategy.SlotInfoAndLocality> selectedAvailableSlot =
+			slotSelectionStrategy.selectBestSlotForProfile(slotInfoList, slotProfile);
+
+		return selectedAvailableSlot.flatMap(
+			slotInfoAndLocality -> slotPool.allocateAvailableSlot(
+				slotRequestId,
+				slotInfoAndLocality.getSlotInfo().getAllocationId())
+		);
+	}
+
+	private void schedulePendingRequestBulkTimeoutCheck(
+		final PhysicalSlotRequestBulk slotRequestBulk,
+		final Time timeout) {
+
+		componentMainThreadExecutor.schedule(() -> {
+			if (!checkPhysicalSlotRequestBulkTimeout(slotRequestBulk, timeout)) {
+				schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+			}
+		}, timeout.getSize(), timeout.getUnit());
+	}
+
+	/**
+	 * Check the slot request bulk and timeout its requests if it has been unfilfillable for too long.
+	 * @param slotRequestBulk bulk of slot requests
+	 * @param slotRequestTimeout indicates how long a pending request can be unfilfillable
+	 * @return true if the slot request bulk is done or timed out, otherwise false
+	 */
+	@VisibleForTesting
+	protected boolean checkPhysicalSlotRequestBulkTimeout(

Review comment:
       Is the main use case of `cancelSlotRequest` in `BulkSlotProvider` to release the slot after its task is done? It is indeed used also to interrupt the request in `SchedulerImpl`, as I understand, but I would consider renaming it to `releaseSlot` in `BulkSlotProvider` when possible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154",
       "triggerID" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219",
       "triggerID" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3230",
       "triggerID" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f9ea7bdfe11e677c214913340ef804e3e28c25aa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f9ea7bdfe11e677c214913340ef804e3e28c25aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bdf970c21e07f96589b85949ea7b697ac5c64c36",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bdf970c21e07f96589b85949ea7b697ac5c64c36",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * d3f81cb67e5924bfc9d3e7bbba25af281e6b3102 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3230) 
   * f9ea7bdfe11e677c214913340ef804e3e28c25aa UNKNOWN
   * bdf970c21e07f96589b85949ea7b697ac5c64c36 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] azagrebin commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435232158



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotProvider.java
##########
@@ -91,6 +92,20 @@
 			allocationTimeout);
 	}
 
+	/**
+	 * Allocates a bulk of physical slots. The allocation will be completed
+	 * normally only when all the requests are fulfilled.
+	 *
+	 * @param physicalSlotRequests requests for physical slots
+	 * @param timeout indicating how long it is accepted that the slot requests can be unfulfillable
+	 * @return future of the results of slot requests
+	 */
+	default CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(

Review comment:
       I think having default method is fine in `BulkSlotProvider`, it should be inherited by `SlotProvider`, right? and then no need to implement it everywhere. It is also related to making `OneSlotPerExecutionSlotAllocator` depending only on `BulkSlotProvider` in #12256.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r434404883



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/AllocatedSlotOccupationTest.java
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.LogicalSlot;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests whether the slot occupation state of {@link AllocatedSlot} is correctly.
+ */
+public class AllocatedSlotOccupationTest extends TestLogger {
+
+	@Test
+	public void testSingleTaskOccupyingSlotIndefinitely() {
+		final PhysicalSlot physicalSlot = createPhysicalSlot();
+		allocateSingleLogicalSlotFromPhysicalSlot(physicalSlot, true);
+
+		assertTrue(physicalSlot.willBeOccupiedIndefinitely());

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] azagrebin commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438644325



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;

Review comment:
       Maybe then `PhysicalSlotRequestBulkChecker`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * ff97aeedb303fe540ebb3728ed280de414606978 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818) 
   * a8902557952ea70746d27e88a392c74724784605 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r437974413



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this(slotSelectionStrategy, slotPool, SystemClock.getInstance());
+	}
+
+	@VisibleForTesting
+	BulkSlotProviderImpl(
+			final SlotSelectionStrategy slotSelectionStrategy,
+			final SlotPool slotPool,
+			final Clock clock) {
+
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+		this.clock = checkNotNull(clock);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request, timeout).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest,
+			final Time timeout) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else {
+			slotFuture = slotPool.requestNewAllocatedSlotWithoutTimeout(
+				slotRequestId,
+				resourceProfile,
+				!physicalSlotRequest.willSlotBeOccupiedIndefinitely());
+		}
+
+		return slotFuture.thenApply(physicalSlot -> new PhysicalSlotRequest.Result(slotRequestId, physicalSlot));
+	}
+
+	private Optional<PhysicalSlot> tryAllocateFromAvailable(
+			final SlotRequestId slotRequestId,
+			final SlotProfile slotProfile) {
+
+		final Collection<SlotSelectionStrategy.SlotInfoAndResources> slotInfoList =
+			slotPool.getAvailableSlotsInformation()
+				.stream()
+				.map(SlotSelectionStrategy.SlotInfoAndResources::fromSingleSlot)
+				.collect(Collectors.toList());
+
+		final Optional<SlotSelectionStrategy.SlotInfoAndLocality> selectedAvailableSlot =
+			slotSelectionStrategy.selectBestSlotForProfile(slotInfoList, slotProfile);
+
+		return selectedAvailableSlot.flatMap(
+			slotInfoAndLocality -> slotPool.allocateAvailableSlot(
+				slotRequestId,
+				slotInfoAndLocality.getSlotInfo().getAllocationId())
+		);
+	}
+
+	private void schedulePendingRequestBulkTimeoutCheck(
+		final PhysicalSlotRequestBulk slotRequestBulk,
+		final Time timeout) {
+
+		componentMainThreadExecutor.schedule(() -> {
+			if (!checkPhysicalSlotRequestBulkTimeout(slotRequestBulk, timeout)) {
+				schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+			}
+		}, timeout.getSize(), timeout.getUnit());
+	}
+
+	/**
+	 * Check the slot request bulk and timeout its requests if it has been unfilfillable for too long.
+	 * @param slotRequestBulk bulk of slot requests
+	 * @param slotRequestTimeout indicates how long a pending request can be unfilfillable
+	 * @return true if the slot request bulk is done or timed out, otherwise false
+	 */
+	@VisibleForTesting
+	protected boolean checkPhysicalSlotRequestBulkTimeout(

Review comment:
       `cancelSlotRequest` are possible to cancel pending physical slots requests. So I think it is better than `releaseSlot`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r437972735



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolImpl.java
##########
@@ -447,12 +452,40 @@ public void releaseSlot(@Nonnull SlotRequestId slotRequestId, @Nullable Throwabl
 
 		componentMainThreadExecutor.assertRunningInMainThread();
 
+		checkState(batchSlotTimeoutCheckEnabled, "batch slot timeout check is disabled unexpectedly.");
+
 		final PendingRequest pendingRequest = PendingRequest.createBatchRequest(slotRequestId, resourceProfile);
 
 		return requestNewAllocatedSlotInternal(pendingRequest)
 			.thenApply(Function.identity());
 	}
 
+	@Override
+	public CompletableFuture<PhysicalSlot> requestNewAllocatedSlotWithoutTimeout(

Review comment:
       done. Added `testIndividualBatchSlotRequestTimeoutCheckIsDisabled`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 35ef82ee0bf1d008719a52e8182f9b63f3eddb11 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729) 
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r434400229



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTrackerTest.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Collections;
+import java.util.concurrent.TimeUnit;
+
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link PhysicalSlotRequestBulkTracker}.
+ */
+public class PhysicalSlotRequestBulkTrackerTest extends TestLogger {
+
+	private ManualClock clock = new ManualClock();
+
+	private PhysicalSlotRequestBulkTracker bulkTracker;
+
+	@Before
+	public void setup() throws Exception {
+		bulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+	}
+
+	@Test
+	public void testBulkTracking() {
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.emptyList());
+
+		assertThat(bulkTracker.isTracked(bulk), is(false));
+
+		bulkTracker.track(bulk);
+
+		assertThat(bulkTracker.isTracked(bulk), is(true));
+		assertThat(bulkTracker.getBulkUnfulfillableSince(bulk), is(clock.relativeTimeMillis()));
+
+		bulkTracker.untrack(bulk);
+
+		assertThat(bulkTracker.isTracked(bulk), is(false));
+	}
+
+	@Test
+	public void testUnfulfillableTimestampWillNotBeOverriddenByFollowingUnfulfillableTimestamp() {
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.emptyList());
+		bulkTracker.track(bulk);
+
+		final long unfulfillableSince = clock.relativeTimeMillis();
+
+		assertThat(bulkTracker.getBulkUnfulfillableSince(bulk), is(unfulfillableSince));
+
+		clock.advanceTime(456, TimeUnit.MILLISECONDS);
+		bulkTracker.markBulkUnfulfillable(bulk, clock.relativeTimeMillis());
+
+		assertThat(bulkTracker.getBulkUnfulfillableSince(bulk), is(unfulfillableSince));
+
+		bulkTracker.markBulkFulfillable(bulk);
+		bulkTracker.markBulkUnfulfillable(bulk, clock.relativeTimeMillis());
+
+		assertThat(bulkTracker.getBulkUnfulfillableSince(bulk), is(clock.relativeTimeMillis()));

Review comment:
       Ok. Added a test `testMarkBulkUnfulfillable`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438652945



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(
+			this::getAllSlotInfos,
+			SystemClock.getInstance());
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public void cancelSlotRequest(SlotRequestId slotRequestId, Throwable cause) {
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		slotPool.releaseSlot(slotRequestId, cause);
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else if (physicalSlotRequest.willSlotBeOccupiedIndefinitely()) {
+			slotFuture = slotPool.requestNewAllocatedSlot(
+				slotRequestId,
+				resourceProfile,
+				null);
+		} else {
+			slotFuture = slotPool.requestNewAllocatedBatchSlot(
+				slotRequestId,
+				resourceProfile);
+		}

Review comment:
       Understood. Sure we can do it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * a8902557952ea70746d27e88a392c74724784605 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848) 
   * 1e045704e218f2cb0fb9dcaa31176c8943f468a5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438664517



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTrackerTest.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link PhysicalSlotRequestBulkTracker}.
+ */
+public class PhysicalSlotRequestBulkTrackerTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(5000L);
+
+	private ManualClock clock = new ManualClock();
+
+	private PhysicalSlotRequestBulkTracker bulkTracker;
+
+	private Set<PhysicalSlot> slots;
+
+	private Supplier<Set<SlotInfo>> slotsRetriever;
+
+	@Before
+	public void setup() throws Exception {
+		slots = new HashSet<>();
+		slotsRetriever = () -> slots.stream().collect(Collectors.toSet());
+		bulkTracker = new PhysicalSlotRequestBulkTracker(slotsRetriever, clock);
+	}
+
+	@Test
+	public void testTrackBulk() {
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.emptyList());
+
+		assertThat(bulkTracker.isTracked(bulk), is(false));
+
+		bulkTracker.track(bulk);
+
+		assertThat(bulkTracker.isTracked(bulk), is(true));
+		assertThat(bulk.getUnfulfillableSince(), is(clock.relativeTimeMillis()));
+	}
+
+	@Test
+	public void testUntrackBulk() {
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.emptyList());
+
+		bulkTracker.track(bulk);
+		bulkTracker.untrack(bulk);
+
+		assertThat(bulkTracker.isTracked(bulk), is(false));
+	}
+
+	@Test
+	public void testBulkFulfilledOnCheck() {
+		final PhysicalSlotRequest request = createPhysicalSlotRequest();
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Arrays.asList(request));

Review comment:
       Fine.
   I just felt this looks more aligned to other tests which create bulks with multiple requests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435126559



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolImpl.java
##########
@@ -447,12 +452,40 @@ public void releaseSlot(@Nonnull SlotRequestId slotRequestId, @Nullable Throwabl
 
 		componentMainThreadExecutor.assertRunningInMainThread();
 
+		checkState(batchSlotTimeoutCheckEnabled, "batch slot timeout check is disabled unexpectedly.");
+
 		final PendingRequest pendingRequest = PendingRequest.createBatchRequest(slotRequestId, resourceProfile);
 
 		return requestNewAllocatedSlotInternal(pendingRequest)
 			.thenApply(Function.identity());
 	}
 
+	@Override
+	public CompletableFuture<PhysicalSlot> requestNewAllocatedSlotWithoutTimeout(

Review comment:
       Agreed it would be better to explicitly disable the timeout check.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c0483c6347992b8c4412da489b5584879834c396 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350) 
   * a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438657256



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;

Review comment:
       done.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(
+			this::getAllSlotInfos,
+			SystemClock.getInstance());
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public void cancelSlotRequest(SlotRequestId slotRequestId, Throwable cause) {
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		slotPool.releaseSlot(slotRequestId, cause);
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else if (physicalSlotRequest.willSlotBeOccupiedIndefinitely()) {
+			slotFuture = slotPool.requestNewAllocatedSlot(
+				slotRequestId,
+				resourceProfile,
+				null);
+		} else {
+			slotFuture = slotPool.requestNewAllocatedBatchSlot(
+				slotRequestId,
+				resourceProfile);
+		}

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154",
       "triggerID" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219",
       "triggerID" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3230",
       "triggerID" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f9ea7bdfe11e677c214913340ef804e3e28c25aa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f9ea7bdfe11e677c214913340ef804e3e28c25aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bdf970c21e07f96589b85949ea7b697ac5c64c36",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3274",
       "triggerID" : "bdf970c21e07f96589b85949ea7b697ac5c64c36",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * f9ea7bdfe11e677c214913340ef804e3e28c25aa UNKNOWN
   * bdf970c21e07f96589b85949ea7b697ac5c64c36 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3274) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] azagrebin commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435231056



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {

Review comment:
       it is fine. Generally, I think we could keep the constructors for pure field assignments of actual dependencies (injections) to simplify mocking dependencies in tests. I mean having no logic in constructors. If some construction logic is required, I think it is more readable to use factory methods which names reflect what they do as constructor cannot have a more descriptive name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435198992



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolImpl.java
##########
@@ -447,12 +452,40 @@ public void releaseSlot(@Nonnull SlotRequestId slotRequestId, @Nullable Throwabl
 
 		componentMainThreadExecutor.assertRunningInMainThread();
 
+		checkState(batchSlotTimeoutCheckEnabled, "batch slot timeout check is disabled unexpectedly.");
+
 		final PendingRequest pendingRequest = PendingRequest.createBatchRequest(slotRequestId, resourceProfile);
 
 		return requestNewAllocatedSlotInternal(pendingRequest)
 			.thenApply(Function.identity());
 	}
 
+	@Override
+	public CompletableFuture<PhysicalSlot> requestNewAllocatedSlotWithoutTimeout(
+			final SlotRequestId slotRequestId,
+			final ResourceProfile resourceProfile,
+			final boolean isBatchRequest) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		// a slot pool can serve either this kind of request or requestNewAllocatedBatchSlot(...), never both
+		disableBatchSlotTimeoutCheck();

Review comment:
       no more `requestNewAllocatedSlotWithoutTimeout()` anymore so no change is needed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 40df0ed6d44b0e9cd11faa7707490045b48151b0 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514) 
   * 3371da5790134088bffa988c5a837c34d5d9a443 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 35ef82ee0bf1d008719a52e8182f9b63f3eddb11 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438016144



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;

Review comment:
       Done via af7d4926ee8bb041713feeed10770147509d2db8.
   `PhysicalSlotRequestBulkTracker#createPhysicalSlotRequestBulk` is introduced so that we can reset the unfulfillable timestamp for a newly created bulk with the clock of `PhysicalSlotRequestBulkTracker`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 35ef82ee0bf1d008719a52e8182f9b63f3eddb11 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729) 
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * ff97aeedb303fe540ebb3728ed280de414606978 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435674952



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this(slotSelectionStrategy, slotPool, SystemClock.getInstance());
+	}
+
+	@VisibleForTesting
+	BulkSlotProviderImpl(
+			final SlotSelectionStrategy slotSelectionStrategy,
+			final SlotPool slotPool,
+			final Clock clock) {
+
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+		this.clock = checkNotNull(clock);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request, timeout).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest,
+			final Time timeout) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else {
+			slotFuture = slotPool.requestNewAllocatedSlotWithoutTimeout(

Review comment:
       Agreed. The design should take both "fail-fast on UnfulfillableSlotRequestException" and "not fail streaming slot request if it is fulfillable" into consideration. And related error messages should be enabled for troubleshooting.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-642539851


   Thanks for the reviewing! @azagrebin 
   Will squash the fix commits and rebase onto latest master for a final pass of CI.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154",
       "triggerID" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219",
       "triggerID" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * a9a9a40e5fedb82c5602b84f59894d8e0ee5b679 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-642719717


   Merging.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] azagrebin commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r437918968



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolImpl.java
##########
@@ -447,12 +452,40 @@ public void releaseSlot(@Nonnull SlotRequestId slotRequestId, @Nullable Throwabl
 
 		componentMainThreadExecutor.assertRunningInMainThread();
 
+		checkState(batchSlotTimeoutCheckEnabled, "batch slot timeout check is disabled unexpectedly.");
+
 		final PendingRequest pendingRequest = PendingRequest.createBatchRequest(slotRequestId, resourceProfile);
 
 		return requestNewAllocatedSlotInternal(pendingRequest)
 			.thenApply(Function.identity());
 	}
 
+	@Override
+	public CompletableFuture<PhysicalSlot> requestNewAllocatedSlotWithoutTimeout(

Review comment:
       I do not see where we call `disableBatchSlotTimeoutCheck` now.
   Should we add it to `BulkSlotProviderImpl#start`?
   Also a test?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r437966502



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(
+			this::getAllSlotInfos,
+			SystemClock.getInstance());
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public void cancelSlotRequest(SlotRequestId slotRequestId, Throwable cause) {
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		slotPool.releaseSlot(slotRequestId, cause);
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else if (physicalSlotRequest.willSlotBeOccupiedIndefinitely()) {
+			slotFuture = slotPool.requestNewAllocatedSlot(
+				slotRequestId,
+				resourceProfile,
+				null);
+		} else {
+			slotFuture = slotPool.requestNewAllocatedBatchSlot(
+				slotRequestId,
+				resourceProfile);
+		}

Review comment:
       Maybe not right now given that it is not required by this change.
   The only difference that `requestNewAllocatedSlot` and `requestNewAllocatedBatchSlot` make is whether the request will fail-fast on normal allocation failures (i.e. failures that are not `UnfulfillableSlotRequestException`).
   I think we can remove the method `requestNewAllocatedBatchSlot` and do not differentiate batch/streaming requests in `SlotPool` anymore once when we are to disable the fail-fast for streaming slot requests on normal allocation failures. This will however lead to behavior changes and will need a discussion in community ML in think.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438668891



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTrackerTest.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link PhysicalSlotRequestBulkTracker}.
+ */
+public class PhysicalSlotRequestBulkTrackerTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(5000L);
+
+	private ManualClock clock = new ManualClock();
+
+	private PhysicalSlotRequestBulkTracker bulkTracker;
+
+	private Set<PhysicalSlot> slots;
+
+	private Supplier<Set<SlotInfo>> slotsRetriever;
+
+	@Before
+	public void setup() throws Exception {
+		slots = new HashSet<>();
+		slotsRetriever = () -> slots.stream().collect(Collectors.toSet());

Review comment:
       done.

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTrackerTest.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link PhysicalSlotRequestBulkTracker}.
+ */
+public class PhysicalSlotRequestBulkTrackerTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(5000L);
+
+	private ManualClock clock = new ManualClock();

Review comment:
       done.

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTrackerTest.java
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link PhysicalSlotRequestBulkTracker}.
+ */
+public class PhysicalSlotRequestBulkTrackerTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(5000L);
+
+	private ManualClock clock = new ManualClock();
+
+	private PhysicalSlotRequestBulkTracker bulkTracker;
+
+	private Set<PhysicalSlot> slots;
+
+	private Supplier<Set<SlotInfo>> slotsRetriever;
+
+	@Before
+	public void setup() throws Exception {
+		slots = new HashSet<>();
+		slotsRetriever = () -> slots.stream().collect(Collectors.toSet());
+		bulkTracker = new PhysicalSlotRequestBulkTracker(slotsRetriever, clock);
+	}
+
+	@Test
+	public void testTrackBulk() {
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.emptyList());
+
+		assertThat(bulkTracker.isTracked(bulk), is(false));
+
+		bulkTracker.track(bulk);
+
+		assertThat(bulkTracker.isTracked(bulk), is(true));
+		assertThat(bulk.getUnfulfillableSince(), is(clock.relativeTimeMillis()));
+	}
+
+	@Test
+	public void testUntrackBulk() {
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Collections.emptyList());
+
+		bulkTracker.track(bulk);
+		bulkTracker.untrack(bulk);
+
+		assertThat(bulkTracker.isTracked(bulk), is(false));
+	}
+
+	@Test
+	public void testBulkFulfilledOnCheck() {
+		final PhysicalSlotRequest request = createPhysicalSlotRequest();
+		final PhysicalSlotRequestBulk bulk = new PhysicalSlotRequestBulk(Arrays.asList(request));

Review comment:
       done.

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImplTest.java
##########
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutorServiceAdapter;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.ExceptionUtils;
+import org.apache.flink.util.FlinkException;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.Matchers.hasSize;
+import static org.hamcrest.Matchers.instanceOf;
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.fail;
+
+/**
+ * Tests for {@link BulkSlotProviderImpl}.
+ */
+public class BulkSlotProviderImplTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(1000L);
+
+	private static ScheduledExecutorService singleThreadScheduledExecutorService;
+
+	private static ComponentMainThreadExecutor mainThreadExecutor;
+
+	private TestingSlotPoolImpl slotPool;
+
+	private BulkSlotProviderImpl bulkSlotProvider;
+
+	private TestingSlotOwner slotOwner;
+
+	private ManualClock clock;
+
+	@BeforeClass
+	public static void setupClass() {
+		singleThreadScheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
+		mainThreadExecutor = ComponentMainThreadExecutorServiceAdapter.forSingleThreadExecutor(singleThreadScheduledExecutorService);
+	}
+
+	@AfterClass
+	public static void teardownClass() {
+		if (singleThreadScheduledExecutorService != null) {
+			singleThreadScheduledExecutorService.shutdownNow();
+		}
+	}
+
+	@Before
+	public void setup() throws Exception {
+		clock = new ManualClock();
+
+		slotPool = new SlotPoolBuilder(mainThreadExecutor).build();
+
+		bulkSlotProvider = new BulkSlotProviderImpl(LocationPreferenceSlotSelectionStrategy.createDefault(), slotPool);
+		bulkSlotProvider.start(mainThreadExecutor);
+
+		slotOwner = new TestingSlotOwner();
+		slotOwner.setReturnAllocatedSlotConsumer(
+			logicalSlot ->
+				slotPool.releaseSlot(
+					logicalSlot.getSlotRequestId(),
+					new FlinkException("Slot is being returned to the SlotPool.")));
+	}
+
+	@After
+	public void teardown() {
+		CompletableFuture.runAsync(() -> slotPool.close(), mainThreadExecutor).join();
+	}
+
+	@Test
+	public void testBulkSlotAllocationFulfilledWithAvailableSlots() throws Exception {
+		final PhysicalSlotRequest request1 = createPhysicalSlotRequest();
+		final PhysicalSlotRequest request2 = createPhysicalSlotRequest();
+		final List<PhysicalSlotRequest> requests = Arrays.asList(request1, request2);
+
+		addSlotToSlotPool();
+		addSlotToSlotPool();
+
+		final CompletableFuture<Collection<PhysicalSlotRequest.Result>> slotFutures = allocateSlots(requests);
+
+		final Collection<PhysicalSlotRequest.Result> results = slotFutures.get(TIMEOUT.getSize(), TIMEOUT.getUnit());
+		final Collection<SlotRequestId> resultRequestIds = results.stream()
+			.map(PhysicalSlotRequest.Result::getSlotRequestId)
+			.collect(Collectors.toList());
+
+		assertThat(resultRequestIds, containsInAnyOrder(request1.getSlotRequestId(), request2.getSlotRequestId()));
+	}
+
+	@Test
+	public void testBulkSlotAllocationFulfilledWithNewSlots() {
+		final List<PhysicalSlotRequest> requests = Arrays.asList(
+			createPhysicalSlotRequest(),
+			createPhysicalSlotRequest());
+		final CompletableFuture<Collection<PhysicalSlotRequest.Result>> slotFutures = allocateSlots(requests);
+
+		addSlotToSlotPool();
+
+		assertThat(slotFutures.isDone(), is(false));
+
+		addSlotToSlotPool();
+
+		assertThat(slotFutures.isDone(), is(true));
+		assertThat(slotFutures.isCompletedExceptionally(), is(false));
+	}
+
+	@Test
+	public void testBulkSlotAllocationTimeoutsIfUnfulfillable() {
+		final List<PhysicalSlotRequest> requests = Arrays.asList(
+			createPhysicalSlotRequest(),
+			createPhysicalSlotRequest());
+		final CompletableFuture<Collection<PhysicalSlotRequest.Result>> slotFutures = allocateSlots(requests);
+
+		addSlotToSlotPool();
+
+		assertThat(slotPool.getAllocatedSlots().listSlotInfo(), hasSize(1));
+
+		clock.advanceTime(TIMEOUT.toMilliseconds() + 1L, TimeUnit.MILLISECONDS);
+
+		try {
+			// wait util the requests timed out
+			slotFutures.get();
+			fail("Expected that the slot futures time out.");
+		} catch (Exception e) {
+			final Optional<Throwable> cause = ExceptionUtils.findThrowableWithMessage(
+				e,
+				"Slot request bulk is not fulfillable!");
+			assertThat(cause.isPresent(), is(true));
+			assertThat(cause.get(), instanceOf(TimeoutException.class));
+		}
+	}
+
+	@Test
+	public void testFailedBulkSlotAllocationReleasesAllocatedSlot() {
+		final List<PhysicalSlotRequest> requests = Arrays.asList(
+			createPhysicalSlotRequest(),
+			createPhysicalSlotRequest());
+		final CompletableFuture<Collection<PhysicalSlotRequest.Result>> slotFutures = allocateSlots(requests);
+
+		addSlotToSlotPool();
+
+		assertThat(slotPool.getAllocatedSlots().listSlotInfo(), hasSize(1));
+
+		clock.advanceTime(TIMEOUT.toMilliseconds() + 1L, TimeUnit.MILLISECONDS);
+
+		try {
+			// wait util the requests timed out
+			slotFutures.get();
+			fail("Expected that the slot futures time out.");
+		} catch (Exception e) {
+			// expected
+		}

Review comment:
       done.

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImplTest.java
##########
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutorServiceAdapter;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.ExceptionUtils;
+import org.apache.flink.util.FlinkException;
+import org.apache.flink.util.TestLogger;
+import org.apache.flink.util.clock.ManualClock;
+
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.Matchers.hasSize;
+import static org.hamcrest.Matchers.instanceOf;
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.fail;
+
+/**
+ * Tests for {@link BulkSlotProviderImpl}.
+ */
+public class BulkSlotProviderImplTest extends TestLogger {
+
+	private static final Time TIMEOUT = Time.milliseconds(1000L);
+
+	private static ScheduledExecutorService singleThreadScheduledExecutorService;
+
+	private static ComponentMainThreadExecutor mainThreadExecutor;
+
+	private TestingSlotPoolImpl slotPool;
+
+	private BulkSlotProviderImpl bulkSlotProvider;
+
+	private TestingSlotOwner slotOwner;
+
+	private ManualClock clock;
+
+	@BeforeClass
+	public static void setupClass() {
+		singleThreadScheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
+		mainThreadExecutor = ComponentMainThreadExecutorServiceAdapter.forSingleThreadExecutor(singleThreadScheduledExecutorService);
+	}
+
+	@AfterClass
+	public static void teardownClass() {
+		if (singleThreadScheduledExecutorService != null) {
+			singleThreadScheduledExecutorService.shutdownNow();
+		}
+	}
+
+	@Before
+	public void setup() throws Exception {
+		clock = new ManualClock();
+
+		slotPool = new SlotPoolBuilder(mainThreadExecutor).build();
+
+		bulkSlotProvider = new BulkSlotProviderImpl(LocationPreferenceSlotSelectionStrategy.createDefault(), slotPool);
+		bulkSlotProvider.start(mainThreadExecutor);
+
+		slotOwner = new TestingSlotOwner();

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * ff97aeedb303fe540ebb3728ed280de414606978 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154",
       "triggerID" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219",
       "triggerID" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3230",
       "triggerID" : "d3f81cb67e5924bfc9d3e7bbba25af281e6b3102",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f9ea7bdfe11e677c214913340ef804e3e28c25aa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f9ea7bdfe11e677c214913340ef804e3e28c25aa",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * d3f81cb67e5924bfc9d3e7bbba25af281e6b3102 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3230) 
   * f9ea7bdfe11e677c214913340ef804e3e28c25aa UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3371da5790134088bffa988c5a837c34d5d9a443 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621) 
   * 35ef82ee0bf1d008719a52e8182f9b63f3eddb11 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r437958642



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolImpl.java
##########
@@ -447,12 +452,40 @@ public void releaseSlot(@Nonnull SlotRequestId slotRequestId, @Nullable Throwabl
 
 		componentMainThreadExecutor.assertRunningInMainThread();
 
+		checkState(batchSlotTimeoutCheckEnabled, "batch slot timeout check is disabled unexpectedly.");
+
 		final PendingRequest pendingRequest = PendingRequest.createBatchRequest(slotRequestId, resourceProfile);
 
 		return requestNewAllocatedSlotInternal(pendingRequest)
 			.thenApply(Function.identity());
 	}
 
+	@Override
+	public CompletableFuture<PhysicalSlot> requestNewAllocatedSlotWithoutTimeout(

Review comment:
       Yeah my stupid mistake.
   Will call it in `BulkSlotProviderImpl#start` and add a test 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154",
       "triggerID" : "af7d4926ee8bb041713feeed10770147509d2db8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219",
       "triggerID" : "a9a9a40e5fedb82c5602b84f59894d8e0ee5b679",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * af7d4926ee8bb041713feeed10770147509d2db8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3154) 
   * a9a9a40e5fedb82c5602b84f59894d8e0ee5b679 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3219) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] azagrebin commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
azagrebin commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r437926403



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;
+
+	PhysicalSlotRequestBulkTracker(final Supplier<Set<SlotInfo>> slotsRetriever, final Clock clock) {
+		this.slotsRetriever = checkNotNull(slotsRetriever);
+		this.clock = checkNotNull(clock);
+		this.slotRequestBulks = Collections.newSetFromMap(new IdentityHashMap<>());
+	}
+
+	void track(final PhysicalSlotRequestBulk bulk) {
+		slotRequestBulks.add(bulk);
+
+		bulk.markUnfulfillable(clock.relativeTimeMillis());
+	}
+
+	void untrack(final PhysicalSlotRequestBulk bulk) {
+		slotRequestBulks.remove(bulk);
+	}
+
+	@VisibleForTesting
+	boolean isTracked(final PhysicalSlotRequestBulk bulk) {
+		return slotRequestBulks.contains(bulk);
+	}
+
+	/**
+	 * Check the slot request bulk and timeout its requests if it has been unfilfillable for too long.
+	 * @param slotRequestBulk bulk of slot requests
+	 * @param slotRequestTimeout indicates how long a pending request can be unfilfillable

Review comment:
       ```suggestion
   	 * @param slotRequestTimeout indicates how long a pending request can be unfulfillable
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(
+			this::getAllSlotInfos,
+			SystemClock.getInstance());
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public void cancelSlotRequest(SlotRequestId slotRequestId, Throwable cause) {
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		slotPool.releaseSlot(slotRequestId, cause);
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else if (physicalSlotRequest.willSlotBeOccupiedIndefinitely()) {
+			slotFuture = slotPool.requestNewAllocatedSlot(
+				slotRequestId,
+				resourceProfile,
+				null);
+		} else {
+			slotFuture = slotPool.requestNewAllocatedBatchSlot(
+				slotRequestId,
+				resourceProfile);
+		}
+
+		return slotFuture.thenApply(physicalSlot -> new PhysicalSlotRequest.Result(slotRequestId, physicalSlot));
+	}
+
+	private Optional<PhysicalSlot> tryAllocateFromAvailable(
+			final SlotRequestId slotRequestId,
+			final SlotProfile slotProfile) {
+
+		final Collection<SlotSelectionStrategy.SlotInfoAndResources> slotInfoList =
+			slotPool.getAvailableSlotsInformation()
+				.stream()
+				.map(SlotSelectionStrategy.SlotInfoAndResources::fromSingleSlot)
+				.collect(Collectors.toList());
+
+		final Optional<SlotSelectionStrategy.SlotInfoAndLocality> selectedAvailableSlot =
+			slotSelectionStrategy.selectBestSlotForProfile(slotInfoList, slotProfile);
+
+		return selectedAvailableSlot.flatMap(
+			slotInfoAndLocality -> slotPool.allocateAvailableSlot(
+				slotRequestId,
+				slotInfoAndLocality.getSlotInfo().getAllocationId())
+		);
+	}
+
+	private void schedulePendingRequestBulkTimeoutCheck(
+			final PhysicalSlotRequestBulk slotRequestBulk,
+			final Time timeout) {
+
+		componentMainThreadExecutor.schedule(() -> {
+			final PhysicalSlotRequestBulkTracker.TimeoutCheckResult result =
+				slotRequestBulkTracker.checkPhysicalSlotRequestBulkTimeout(slotRequestBulk, timeout);
+
+			switch (result) {
+				case PENDING:
+					//re-schedule the timeout check
+					schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+					break;
+				case TIMEOUT:
+					timeoutSlotRequestBulk(slotRequestBulk);
+					break;
+				default: // no action to take
+			}
+		}, timeout.getSize(), timeout.getUnit());
+	}
+
+	private void timeoutSlotRequestBulk(final PhysicalSlotRequestBulk slotRequestBulk) {
+		final Exception cause = new TimeoutException("Slot request bulk is not fulfillable!");
+		// pending requests must be canceled first otherwise they might be fulfilled by
+		// allocated slots released from this bulk
+		for (SlotRequestId slotRequestId : slotRequestBulk.getPendingRequests().keySet()) {
+			cancelSlotRequest(slotRequestId, cause);
+		}
+		for (SlotRequestId slotRequestId : slotRequestBulk.getFulfilledRequests().keySet()) {
+			cancelSlotRequest(slotRequestId, cause);
+		}
+	}
+
+	private Set<SlotInfo> getAllSlotInfos() {
+		return Stream
+			.concat(
+				slotPool.getAvailableSlotsInformation().stream(),
+				slotPool.getAllocatedSlotsInformation().stream())
+			.collect(Collectors.toSet());
+	}

Review comment:
       I thought we could actually move `timeoutSlotRequestBulk/getAllSlotInfos` and pass `slotPool` to `PhysicalSlotRequestBulkTracker`. Would this simplify multithreading in tests?
   For multithreading, we would only need to check whether `BulkSlotProviderImpl` properly schedules `checkPhysicalSlotRequestBulkTimeout`.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(
+			this::getAllSlotInfos,
+			SystemClock.getInstance());
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public void cancelSlotRequest(SlotRequestId slotRequestId, Throwable cause) {
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		slotPool.releaseSlot(slotRequestId, cause);
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else if (physicalSlotRequest.willSlotBeOccupiedIndefinitely()) {
+			slotFuture = slotPool.requestNewAllocatedSlot(
+				slotRequestId,
+				resourceProfile,
+				null);
+		} else {
+			slotFuture = slotPool.requestNewAllocatedBatchSlot(
+				slotRequestId,
+				resourceProfile);
+		}

Review comment:
       nit: maybe one more method `slotFuture = requestNewSlot(willSlotBeOccupiedIndefinitely, slotRequestId, resourceProfile)`?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;
+
+	PhysicalSlotRequestBulkTracker(final Supplier<Set<SlotInfo>> slotsRetriever, final Clock clock) {
+		this.slotsRetriever = checkNotNull(slotsRetriever);
+		this.clock = checkNotNull(clock);
+		this.slotRequestBulks = Collections.newSetFromMap(new IdentityHashMap<>());
+	}
+
+	void track(final PhysicalSlotRequestBulk bulk) {
+		slotRequestBulks.add(bulk);
+
+		bulk.markUnfulfillable(clock.relativeTimeMillis());
+	}
+
+	void untrack(final PhysicalSlotRequestBulk bulk) {
+		slotRequestBulks.remove(bulk);
+	}
+
+	@VisibleForTesting
+	boolean isTracked(final PhysicalSlotRequestBulk bulk) {
+		return slotRequestBulks.contains(bulk);
+	}
+
+	/**
+	 * Check the slot request bulk and timeout its requests if it has been unfilfillable for too long.
+	 * @param slotRequestBulk bulk of slot requests
+	 * @param slotRequestTimeout indicates how long a pending request can be unfilfillable
+	 * @return result of the check, indicating the bulk is fulfilled, still pending, or timed out
+	 */
+	TimeoutCheckResult checkPhysicalSlotRequestBulkTimeout(
+			final PhysicalSlotRequestBulk slotRequestBulk,
+			final Time slotRequestTimeout) {
+
+		checkState(isTracked(slotRequestBulk));
+
+		if (slotRequestBulk.getPendingRequests().isEmpty()) {
+			return TimeoutCheckResult.FULFILLED;
+		}
+
+		final boolean fulfillable = isSlotRequestBulkFulfillable(slotRequestBulk, slotsRetriever);
+		if (fulfillable) {
+			slotRequestBulk.markFulfillable();
+		} else {
+			final long currentTimestamp = clock.relativeTimeMillis();
+
+			slotRequestBulk.markUnfulfillable(currentTimestamp);
+
+			final long unfulfillableSince = slotRequestBulk.getUnfulfillableSince();
+			if (unfulfillableSince + slotRequestTimeout.toMilliseconds() <= currentTimestamp) {
+				return TimeoutCheckResult.TIMEOUT;
+			}
+		}
+
+		return TimeoutCheckResult.PENDING;
+	}
+
+	/**
+	 * Returns whether the given bulk of slot requests are possible to be fulfilled at the same time
+	 * with all the reusable slots in the slot pool. A reusable slot means the slot is available or
+	 * will not be occupied indefinitely.
+	 *
+	 * @param slotRequestBulk bulk of slot requests to check
+	 * @param slotsRetriever supplies slots to be used for the fulfill-ability check
+	 * @return true if the slot requests are possible to be fulfilled, otherwise false
+	 */
+	@VisibleForTesting
+	static boolean isSlotRequestBulkFulfillable(

Review comment:
       Why do we need to expose `isSlotRequestBulkFulfillable`?
   Could we not test the same with `checkPhysicalSlotRequestBulkTimeout` and e.g. `PhysicalSlotRequestBulk.isFulfillable`?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;
+
+	PhysicalSlotRequestBulkTracker(final Supplier<Set<SlotInfo>> slotsRetriever, final Clock clock) {
+		this.slotsRetriever = checkNotNull(slotsRetriever);
+		this.clock = checkNotNull(clock);
+		this.slotRequestBulks = Collections.newSetFromMap(new IdentityHashMap<>());
+	}
+
+	void track(final PhysicalSlotRequestBulk bulk) {
+		slotRequestBulks.add(bulk);
+
+		bulk.markUnfulfillable(clock.relativeTimeMillis());
+	}
+
+	void untrack(final PhysicalSlotRequestBulk bulk) {
+		slotRequestBulks.remove(bulk);
+	}
+
+	@VisibleForTesting
+	boolean isTracked(final PhysicalSlotRequestBulk bulk) {
+		return slotRequestBulks.contains(bulk);
+	}
+
+	/**
+	 * Check the slot request bulk and timeout its requests if it has been unfilfillable for too long.

Review comment:
       ```suggestion
   	 * Check the slot request bulk and timeout its requests if it has been unfulfillable for too long.
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PhysicalSlotRequestBulkTracker.java
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.util.clock.Clock;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.IdentityHashMap;
+import java.util.Optional;
+import java.util.Set;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * Tracks physical slot request bulks. Once a bulk is registered, a fulfill-ability check for it will be scheduled.
+ */
+class PhysicalSlotRequestBulkTracker {
+
+	private final Supplier<Set<SlotInfo>> slotsRetriever;
+
+	private final Clock clock;
+
+	private final Set<PhysicalSlotRequestBulk> slotRequestBulks;

Review comment:
       Not sure I understand why we need `slotRequestBulks/track/untrack/isTracked`.
   Looks only for tests. Can we remove it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2420",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2514",
       "triggerID" : "40df0ed6d44b0e9cd11faa7707490045b48151b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2621",
       "triggerID" : "3371da5790134088bffa988c5a837c34d5d9a443",
       "triggerType" : "PUSH"
     }, {
       "hash" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2729",
       "triggerID" : "35ef82ee0bf1d008719a52e8182f9b63f3eddb11",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cd3fc98c034fdc61235d9109c05b4f55d7423746",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2818",
       "triggerID" : "ff97aeedb303fe540ebb3728ed280de414606978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a8902557952ea70746d27e88a392c74724784605",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848",
       "triggerID" : "a8902557952ea70746d27e88a392c74724784605",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141",
       "triggerID" : "1e045704e218f2cb0fb9dcaa31176c8943f468a5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cd3fc98c034fdc61235d9109c05b4f55d7423746 UNKNOWN
   * a8902557952ea70746d27e88a392c74724784605 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2848) 
   * 1e045704e218f2cb0fb9dcaa31176c8943f468a5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=3141) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r438606184



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(
+			this::getAllSlotInfos,
+			SystemClock.getInstance());
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public void cancelSlotRequest(SlotRequestId slotRequestId, Throwable cause) {
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		slotPool.releaseSlot(slotRequestId, cause);
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else if (physicalSlotRequest.willSlotBeOccupiedIndefinitely()) {
+			slotFuture = slotPool.requestNewAllocatedSlot(
+				slotRequestId,
+				resourceProfile,
+				null);
+		} else {
+			slotFuture = slotPool.requestNewAllocatedBatchSlot(
+				slotRequestId,
+				resourceProfile);
+		}
+
+		return slotFuture.thenApply(physicalSlot -> new PhysicalSlotRequest.Result(slotRequestId, physicalSlot));
+	}
+
+	private Optional<PhysicalSlot> tryAllocateFromAvailable(
+			final SlotRequestId slotRequestId,
+			final SlotProfile slotProfile) {
+
+		final Collection<SlotSelectionStrategy.SlotInfoAndResources> slotInfoList =
+			slotPool.getAvailableSlotsInformation()
+				.stream()
+				.map(SlotSelectionStrategy.SlotInfoAndResources::fromSingleSlot)
+				.collect(Collectors.toList());
+
+		final Optional<SlotSelectionStrategy.SlotInfoAndLocality> selectedAvailableSlot =
+			slotSelectionStrategy.selectBestSlotForProfile(slotInfoList, slotProfile);
+
+		return selectedAvailableSlot.flatMap(
+			slotInfoAndLocality -> slotPool.allocateAvailableSlot(
+				slotRequestId,
+				slotInfoAndLocality.getSlotInfo().getAllocationId())
+		);
+	}
+
+	private void schedulePendingRequestBulkTimeoutCheck(
+			final PhysicalSlotRequestBulk slotRequestBulk,
+			final Time timeout) {
+
+		componentMainThreadExecutor.schedule(() -> {
+			final PhysicalSlotRequestBulkTracker.TimeoutCheckResult result =
+				slotRequestBulkTracker.checkPhysicalSlotRequestBulkTimeout(slotRequestBulk, timeout);
+
+			switch (result) {
+				case PENDING:
+					//re-schedule the timeout check
+					schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+					break;
+				case TIMEOUT:
+					timeoutSlotRequestBulk(slotRequestBulk);
+					break;
+				default: // no action to take
+			}
+		}, timeout.getSize(), timeout.getUnit());
+	}
+
+	private void timeoutSlotRequestBulk(final PhysicalSlotRequestBulk slotRequestBulk) {
+		final Exception cause = new TimeoutException("Slot request bulk is not fulfillable!");
+		// pending requests must be canceled first otherwise they might be fulfilled by
+		// allocated slots released from this bulk
+		for (SlotRequestId slotRequestId : slotRequestBulk.getPendingRequests().keySet()) {
+			cancelSlotRequest(slotRequestId, cause);
+		}
+		for (SlotRequestId slotRequestId : slotRequestBulk.getFulfilledRequests().keySet()) {
+			cancelSlotRequest(slotRequestId, cause);
+		}
+	}
+
+	private Set<SlotInfo> getAllSlotInfos() {
+		return Stream
+			.concat(
+				slotPool.getAvailableSlotsInformation().stream(),
+				slotPool.getAllocatedSlotsInformation().stream())
+			.collect(Collectors.toSet());
+	}

Review comment:
       `PhysicalSlotRequestBulkTracker` does not contains actions on slot requests. All actions (allocating/canceling) take place in `BulkSlotProviderImpl`. I think this makes it easier to reasoning the slot request status.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r435198511



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/AllocatedSlotOccupationTest.java
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.LogicalSlot;
+import org.apache.flink.runtime.jobmaster.SlotOwner;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests whether the slot occupation state of {@link AllocatedSlot} is correctly.
+ */
+public class AllocatedSlotOccupationTest extends TestLogger {
+
+	@Test
+	public void testSingleTaskOccupyingSlotIndefinitely() {
+		final PhysicalSlot physicalSlot = createPhysicalSlot();
+		allocateSingleLogicalSlotFromPhysicalSlot(physicalSlot, true);
+
+		assertTrue(physicalSlot.willBeOccupiedIndefinitely());
+	}
+
+	@Test
+	public void testSingleTaskNotOccupyingSlotIndefinitely() {
+		final PhysicalSlot physicalSlot = createPhysicalSlot();
+		allocateSingleLogicalSlotFromPhysicalSlot(physicalSlot, false);
+
+		assertFalse(physicalSlot.willBeOccupiedIndefinitely());
+	}
+
+	private static PhysicalSlot createPhysicalSlot() {
+		return new AllocatedSlot(
+			new AllocationID(),
+			new LocalTaskManagerLocation(),
+			0,
+			ResourceProfile.ANY,
+			new SimpleAckingTaskManagerGateway());
+	}
+
+	static LogicalSlot allocateSingleLogicalSlotFromPhysicalSlot(
+			final PhysicalSlot physicalSlot,
+			final boolean slotWillBeOccupiedIndefinitely) {
+
+		return allocateSingleLogicalSlotFromPhysicalSlot(
+			new SlotRequestId(),
+			physicalSlot,
+			new TestingSlotOwner(),
+			slotWillBeOccupiedIndefinitely);
+	}
+
+	static LogicalSlot allocateSingleLogicalSlotFromPhysicalSlot(

Review comment:
       done via cafd808b7d2870593dd42c1aab124600a9fec73f




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0483c6347992b8c4412da489b5584879834c396",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2350",
       "triggerID" : "c0483c6347992b8c4412da489b5584879834c396",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396",
       "triggerID" : "a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a758c3a88a0a6cc2e0b183b20a17fe618f8cbbb9 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2396) 
   * 395c6cba8c463cffe2cf3cc70d91e6c6ff0dbb9f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zhuzhurk commented on a change in pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on a change in pull request #12375:
URL: https://github.com/apache/flink/pull/12375#discussion_r434988361



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/BulkSlotProviderImpl.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import org.apache.flink.runtime.clusterframework.types.SlotProfile;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import org.apache.flink.runtime.concurrent.FutureUtils;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.util.clock.Clock;
+import org.apache.flink.util.clock.SystemClock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * Default implementation of {@link BulkSlotProvider}.
+ */
+class BulkSlotProviderImpl implements BulkSlotProvider {
+
+	private static final Logger LOG = LoggerFactory.getLogger(BulkSlotProviderImpl.class);
+
+	private ComponentMainThreadExecutor componentMainThreadExecutor;
+
+	private final SlotSelectionStrategy slotSelectionStrategy;
+
+	private final SlotPool slotPool;
+
+	private final Clock clock;
+
+	private final PhysicalSlotRequestBulkTracker slotRequestBulkTracker;
+
+	BulkSlotProviderImpl(final SlotSelectionStrategy slotSelectionStrategy, final SlotPool slotPool) {
+		this(slotSelectionStrategy, slotPool, SystemClock.getInstance());
+	}
+
+	@VisibleForTesting
+	BulkSlotProviderImpl(
+			final SlotSelectionStrategy slotSelectionStrategy,
+			final SlotPool slotPool,
+			final Clock clock) {
+
+		this.slotSelectionStrategy = checkNotNull(slotSelectionStrategy);
+		this.slotPool = checkNotNull(slotPool);
+		this.clock = checkNotNull(clock);
+
+		this.slotRequestBulkTracker = new PhysicalSlotRequestBulkTracker(clock);
+
+		this.componentMainThreadExecutor = new ComponentMainThreadExecutor.DummyComponentMainThreadExecutor(
+			"Scheduler is not initialized with proper main thread executor. " +
+				"Call to BulkSlotProvider.start(...) required.");
+	}
+
+	@Override
+	public void start(final ComponentMainThreadExecutor mainThreadExecutor) {
+		this.componentMainThreadExecutor = mainThreadExecutor;
+	}
+
+	@Override
+	public CompletableFuture<Collection<PhysicalSlotRequest.Result>> allocatePhysicalSlots(
+			final Collection<PhysicalSlotRequest> physicalSlotRequests,
+			final Time timeout) {
+
+		componentMainThreadExecutor.assertRunningInMainThread();
+
+		LOG.debug("Received {} slot requests.", physicalSlotRequests.size());
+
+		final PhysicalSlotRequestBulk slotRequestBulk = new PhysicalSlotRequestBulk(physicalSlotRequests);
+
+		final List<CompletableFuture<PhysicalSlotRequest.Result>> resultFutures = new ArrayList<>(physicalSlotRequests.size());
+		for (PhysicalSlotRequest request : physicalSlotRequests) {
+			final CompletableFuture<PhysicalSlotRequest.Result> resultFuture =
+				allocatePhysicalSlot(request, timeout).thenApply(result -> {
+					slotRequestBulk.markRequestFulfilled(
+						result.getSlotRequestId(),
+						result.getPhysicalSlot().getAllocationId());
+
+					return result;
+				});
+			resultFutures.add(resultFuture);
+		}
+
+		slotRequestBulkTracker.track(slotRequestBulk);
+		schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, timeout);
+
+		return FutureUtils.combineAll(resultFutures)
+			.whenComplete((ignore, throwable) -> slotRequestBulkTracker.untrack(slotRequestBulk));
+	}
+
+	private CompletableFuture<PhysicalSlotRequest.Result> allocatePhysicalSlot(
+			final PhysicalSlotRequest physicalSlotRequest,
+			final Time timeout) {
+
+		final SlotRequestId slotRequestId = physicalSlotRequest.getSlotRequestId();
+		final SlotProfile slotProfile = physicalSlotRequest.getSlotProfile();
+		final ResourceProfile resourceProfile = slotProfile.getPhysicalSlotResourceProfile();
+
+		LOG.debug("Received slot request [{}] with resource requirements: {}", slotRequestId, resourceProfile);
+
+		final Optional<PhysicalSlot> availablePhysicalSlot = tryAllocateFromAvailable(slotRequestId, slotProfile);
+
+		final CompletableFuture<PhysicalSlot> slotFuture;
+		if (availablePhysicalSlot.isPresent()) {
+			slotFuture = CompletableFuture.completedFuture(availablePhysicalSlot.get());
+		} else {
+			slotFuture = slotPool.requestNewAllocatedSlotWithoutTimeout(

Review comment:
       Anyhow I totally agreed that we should make it easier for users to debug bulk slot allocation failures. 
   Opened a ticket FLINK-18114 to track it. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4fe765587c0423b2d29d5dde946b94edff79398b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4fe765587c0423b2d29d5dde946b94edff79398b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org