You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/10/28 15:51:52 UTC

[GitHub] [flink] pnowojski opened a new pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

pnowojski opened a new pull request #13827:
URL: https://github.com/apache/flink/pull/13827


   Purpose of this PR is to timeout aligned checkpoints based on checkpointStartDelay.
   
   This PR is based on https://github.com/apache/flink/pull/13741, so please ignore a couple of first commits from this review.
   
   ## Verifying this change
   
   This PR adds a couple of tests, but also every existing test that has enabled unaligned checkpoints (there is a default timeout value of 30 seconds).
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
     - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d636320d382b8561205cea4fddd1150fa18b11e8 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528) 
   * d90e259e26540d749797fe78b59b50cb94e8c66d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084",
       "triggerID" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9101",
       "triggerID" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dc80e52b6ba5209a31613869500d9d12301076bb",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9106",
       "triggerID" : "dc80e52b6ba5209a31613869500d9d12301076bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6fb107e78a75e28b334c5afaf47428f4485a8fc8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9119",
       "triggerID" : "6fb107e78a75e28b334c5afaf47428f4485a8fc8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f1a0b7f794c8dcbd1a3221f262b2632c1e57f975 UNKNOWN
   * dc80e52b6ba5209a31613869500d9d12301076bb Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9106) 
   * 6fb107e78a75e28b334c5afaf47428f4485a8fc8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9119) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084",
       "triggerID" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ed4f8fcc0c3674eb4006f07679b1cf77cd827f24 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084) 
   * f1a0b7f794c8dcbd1a3221f262b2632c1e57f975 UNKNOWN
   * 0d98df2df5dde005c6b78dbd1e775ff6e8ed801e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] pnowojski commented on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
pnowojski commented on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-742575787


   Superseded by https://github.com/apache/flink/pull/14057


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084",
       "triggerID" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9101",
       "triggerID" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dc80e52b6ba5209a31613869500d9d12301076bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9106",
       "triggerID" : "dc80e52b6ba5209a31613869500d9d12301076bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6fb107e78a75e28b334c5afaf47428f4485a8fc8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9119",
       "triggerID" : "6fb107e78a75e28b334c5afaf47428f4485a8fc8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f1a0b7f794c8dcbd1a3221f262b2632c1e57f975 UNKNOWN
   * 6fb107e78a75e28b334c5afaf47428f4485a8fc8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9119) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
AHeise commented on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718670698


   > @AHeise, it would be nice to reduce alignment timeout for our tests, from the default 30s down to let's say 100ms. Do you have idea how we can do it easily/cheaply?
   
   I'd address that in the test randomization setup. I don't see an easy way except relaying a system property from "pom.xml" to the config.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] SteNicholas commented on a change in pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on a change in pull request #13827:
URL: https://github.com/apache/flink/pull/13827#discussion_r514985679



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java
##########
@@ -517,6 +517,21 @@ public void checkpointStopped(long checkpointId) {
 		}
 	}
 
+	@Override
+	public void convertToPriorityEvent(int sequenceNumber) throws IOException {
+		boolean firstPriorityEvent;
+		synchronized (receivedBuffers) {
+			checkState(!channelStatePersister.hasBarrierReceived());
+			SequenceBuffer toPrioritize = receivedBuffers.getAndRemove(
+				sequenceBuffer -> sequenceBuffer.sequenceNumber == sequenceNumber);
+			checkState(!toPrioritize.buffer.isBuffer());
+			firstPriorityEvent = addPriorityBuffer(toPrioritize);
+		}
+		if (firstPriorityEvent) {
+			notifyPriorityEvent(sequenceNumber);
+		}
+	}
+

Review comment:
       @pnowojski , I thought that `firstPriorityEvent` variable is unused and could be removed like following:
   ```
   public void convertToPriorityEvent(int sequenceNumber) throws IOException {
   		synchronized (receivedBuffers) {
   			checkState(!channelStatePersister.hasBarrierReceived());
   			SequenceBuffer toPrioritize = receivedBuffers.getAndRemove(
   				sequenceBuffer -> sequenceBuffer.sequenceNumber == sequenceNumber);
   			checkState(!toPrioritize.buffer.isBuffer());
               if (addPriorityBuffer(toPrioritize)) {
   			   notifyPriorityEvent(sequenceNumber);
   	 	    }
   		}
   	}
   ```
   And I have a question about the firstPriorityEvent whether to mean the recently aligned `CheckpointBarrier `.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084",
       "triggerID" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ed4f8fcc0c3674eb4006f07679b1cf77cd827f24 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] pnowojski closed pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
pnowojski closed pull request #13827:
URL: https://github.com/apache/flink/pull/13827


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084",
       "triggerID" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9101",
       "triggerID" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dc80e52b6ba5209a31613869500d9d12301076bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9106",
       "triggerID" : "dc80e52b6ba5209a31613869500d9d12301076bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6fb107e78a75e28b334c5afaf47428f4485a8fc8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9119",
       "triggerID" : "6fb107e78a75e28b334c5afaf47428f4485a8fc8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a82a2415ec2e764329ffac993d91f52f8a98084c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a82a2415ec2e764329ffac993d91f52f8a98084c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f1a0b7f794c8dcbd1a3221f262b2632c1e57f975 UNKNOWN
   * 6fb107e78a75e28b334c5afaf47428f4485a8fc8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9119) 
   * a82a2415ec2e764329ffac993d91f52f8a98084c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d636320d382b8561205cea4fddd1150fa18b11e8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d636320d382b8561205cea4fddd1150fa18b11e8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] pnowojski commented on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
pnowojski commented on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718028073


   @AHeise, it would be nice to reduce alignment timeout for our tests, from the default 30s down to let's say 100ms. Do you have idea how we can do it easily/cheaply?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d636320d382b8561205cea4fddd1150fa18b11e8 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] pnowojski commented on a change in pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
pnowojski commented on a change in pull request #13827:
URL: https://github.com/apache/flink/pull/13827#discussion_r514425412



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java
##########
@@ -517,6 +517,21 @@ public void checkpointStopped(long checkpointId) {
 		}
 	}
 
+	@Override
+	public void convertToPriorityEvent(int sequenceNumber) throws IOException {
+		boolean firstPriorityEvent;
+		synchronized (receivedBuffers) {
+			checkState(!channelStatePersister.hasBarrierReceived());
+			SequenceBuffer toPrioritize = receivedBuffers.getAndRemove(
+				sequenceBuffer -> sequenceBuffer.sequenceNumber == sequenceNumber);
+			checkState(!toPrioritize.buffer.isBuffer());
+			firstPriorityEvent = addPriorityBuffer(toPrioritize);
+		}
+		if (firstPriorityEvent) {
+			notifyPriorityEvent(sequenceNumber);
+		}
+	}
+

Review comment:
       This is different to what we discussed offline @AHeise . Instead of switching to the pattern of keeping the `CheckpointBarrier` at the tail of the input queue, and spilling everything that falls between `CheckpointBarrier` and it's announcement, I decided to keep for the sake of simplicity the current approach of using priority events. And instead I'm just prioritizing the previously announced aligned `CheckpointBarrier`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rkhachatryan commented on a change in pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
rkhachatryan commented on a change in pull request #13827:
URL: https://github.com/apache/flink/pull/13827#discussion_r527794600



##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/SingleCheckpointBarrierHandler.java
##########
@@ -235,6 +257,23 @@ int getNumOpenChannels() {
 		return numOpenChannels;
 	}
 
+	private CheckpointBarrier maybeTimeout(CheckpointBarrier barrier) {
+		CheckpointOptions options = barrier.getCheckpointOptions();
+		boolean shouldTimeout = (options.isTimeoutable()) && (
+			barrier.getId() == timeoutedBarrierId ||
+			(System.currentTimeMillis() - barrier.getTimestamp()) > options.getAlignmentTimeout());

Review comment:
       After a discussion with @NicoK, @sjwiesman and @alpinegizmo we decided to:
   1. Decide to timeout based on the alignment start time
   1. By default, propagate this decision downstream; provide an option to disable propagation
   1. In the UI, show checkpoint type for each subtask; on a checkpoint level display unaligned if at least one subtask did UC
   1. Consider renaming `alignment timeout` option to  `subtask alignment timeout` 
   
   Considerations:
   - the overhead of UC (persisting channels) should ideally be localized
   - the less global the decision is, the more difficult it might be to debug UC-related issues
   - In a common scenario, backpressure comes from sinks; buffers will be full, so disabling propagation doesn't make a difference
   
   A ticket to address 2-4: FLINK-20488




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084",
       "triggerID" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9101",
       "triggerID" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dc80e52b6ba5209a31613869500d9d12301076bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9106",
       "triggerID" : "dc80e52b6ba5209a31613869500d9d12301076bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6fb107e78a75e28b334c5afaf47428f4485a8fc8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9119",
       "triggerID" : "6fb107e78a75e28b334c5afaf47428f4485a8fc8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a82a2415ec2e764329ffac993d91f52f8a98084c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9180",
       "triggerID" : "a82a2415ec2e764329ffac993d91f52f8a98084c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f1a0b7f794c8dcbd1a3221f262b2632c1e57f975 UNKNOWN
   * a82a2415ec2e764329ffac993d91f52f8a98084c Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9180) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d90e259e26540d749797fe78b59b50cb94e8c66d Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618) 
   * ed4f8fcc0c3674eb4006f07679b1cf77cd827f24 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d636320d382b8561205cea4fddd1150fa18b11e8 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528) 
   * d90e259e26540d749797fe78b59b50cb94e8c66d Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084",
       "triggerID" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ed4f8fcc0c3674eb4006f07679b1cf77cd827f24 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084) 
   * f1a0b7f794c8dcbd1a3221f262b2632c1e57f975 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rkhachatryan commented on a change in pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
rkhachatryan commented on a change in pull request #13827:
URL: https://github.com/apache/flink/pull/13827#discussion_r527794600



##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/SingleCheckpointBarrierHandler.java
##########
@@ -235,6 +257,23 @@ int getNumOpenChannels() {
 		return numOpenChannels;
 	}
 
+	private CheckpointBarrier maybeTimeout(CheckpointBarrier barrier) {
+		CheckpointOptions options = barrier.getCheckpointOptions();
+		boolean shouldTimeout = (options.isTimeoutable()) && (
+			barrier.getId() == timeoutedBarrierId ||
+			(System.currentTimeMillis() - barrier.getTimestamp()) > options.getAlignmentTimeout());

Review comment:
       After a discussion with @NicoK, @sjwiesman and @alpinegizmo we decided to:
   1. Decide to timeout based on the alignment start time
   1. By default, propagate this decision downstream; provide an option to disable propagation
   1. In the UI, show checkpoint type for each subtask; on a checkpoint level display unaligned if at least one subtask did UC
   1. Consider renaming `alignment timeout` option to  `subtask alignment timeout` 
   
   Considerations:
   - the overhead of UC (persisting channels) should ideally be localized
   - the less global the decision is, the more difficult it might be to debug UC-related issues
   - In a common scenario, backpressure comes from sinks; buffers will be full, so disabling propagation doesn't make a difference
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rkhachatryan commented on a change in pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
rkhachatryan commented on a change in pull request #13827:
URL: https://github.com/apache/flink/pull/13827#discussion_r524193666



##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/SingleCheckpointBarrierHandler.java
##########
@@ -235,6 +257,23 @@ int getNumOpenChannels() {
 		return numOpenChannels;
 	}
 
+	private CheckpointBarrier maybeTimeout(CheckpointBarrier barrier) {
+		CheckpointOptions options = barrier.getCheckpointOptions();
+		boolean shouldTimeout = (options.isTimeoutable()) && (
+			barrier.getId() == timeoutedBarrierId ||
+			(System.currentTimeMillis() - barrier.getTimestamp()) > options.getAlignmentTimeout());

Review comment:
       We discussed offline two options of switching a checkpoint from aligned to unaligned:
   1. after some time since the start of a checkpoint on JM
   1. after some time since the start of alignment on this subtask
   
   (in either case, all the downstream tasks would proceed in UC mode with this checkpoint after it's timed out)
   
   The latter gives more control, i.e. the ability to address long alignment in isolation, but not long barrier travel time (e.g. long sync phase or just long pipeline).
   
   We couldn't agree on which one would be more convenient for the users.
   
   @NicoK, do you have an opinion on that?
   
   Or maybe we should provide both?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084",
       "triggerID" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9101",
       "triggerID" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dc80e52b6ba5209a31613869500d9d12301076bb",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9106",
       "triggerID" : "dc80e52b6ba5209a31613869500d9d12301076bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6fb107e78a75e28b334c5afaf47428f4485a8fc8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6fb107e78a75e28b334c5afaf47428f4485a8fc8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f1a0b7f794c8dcbd1a3221f262b2632c1e57f975 UNKNOWN
   * dc80e52b6ba5209a31613869500d9d12301076bb Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9106) 
   * 6fb107e78a75e28b334c5afaf47428f4485a8fc8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718029896


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit d636320d382b8561205cea4fddd1150fa18b11e8 (Wed Oct 28 15:55:13 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] SteNicholas commented on a change in pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on a change in pull request #13827:
URL: https://github.com/apache/flink/pull/13827#discussion_r514985679



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java
##########
@@ -517,6 +517,21 @@ public void checkpointStopped(long checkpointId) {
 		}
 	}
 
+	@Override
+	public void convertToPriorityEvent(int sequenceNumber) throws IOException {
+		boolean firstPriorityEvent;
+		synchronized (receivedBuffers) {
+			checkState(!channelStatePersister.hasBarrierReceived());
+			SequenceBuffer toPrioritize = receivedBuffers.getAndRemove(
+				sequenceBuffer -> sequenceBuffer.sequenceNumber == sequenceNumber);
+			checkState(!toPrioritize.buffer.isBuffer());
+			firstPriorityEvent = addPriorityBuffer(toPrioritize);
+		}
+		if (firstPriorityEvent) {
+			notifyPriorityEvent(sequenceNumber);
+		}
+	}
+

Review comment:
       @pnowojski , I thought that `firstPriorityEvent` variable is unused and could be removed like following:
   ```
   public void convertToPriorityEvent(int sequenceNumber) throws IOException {
   		synchronized (receivedBuffers) {
   			checkState(!channelStatePersister.hasBarrierReceived());
   			SequenceBuffer toPrioritize = receivedBuffers.getAndRemove(
   				sequenceBuffer -> sequenceBuffer.sequenceNumber == sequenceNumber);
   			checkState(!toPrioritize.buffer.isBuffer());
               if (addPriorityBuffer(toPrioritize)) {
   			   notifyPriorityEvent(sequenceNumber);
   	 	    }
   		}
   	}
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084",
       "triggerID" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d90e259e26540d749797fe78b59b50cb94e8c66d Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618) 
   * ed4f8fcc0c3674eb4006f07679b1cf77cd827f24 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d90e259e26540d749797fe78b59b50cb94e8c66d Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13827:
URL: https://github.com/apache/flink/pull/13827#issuecomment-718044724


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8528",
       "triggerID" : "d636320d382b8561205cea4fddd1150fa18b11e8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8618",
       "triggerID" : "d90e259e26540d749797fe78b59b50cb94e8c66d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9084",
       "triggerID" : "ed4f8fcc0c3674eb4006f07679b1cf77cd827f24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f1a0b7f794c8dcbd1a3221f262b2632c1e57f975",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9101",
       "triggerID" : "0d98df2df5dde005c6b78dbd1e775ff6e8ed801e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "dc80e52b6ba5209a31613869500d9d12301076bb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "dc80e52b6ba5209a31613869500d9d12301076bb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f1a0b7f794c8dcbd1a3221f262b2632c1e57f975 UNKNOWN
   * 0d98df2df5dde005c6b78dbd1e775ff6e8ed801e Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9101) 
   * dc80e52b6ba5209a31613869500d9d12301076bb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org