You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2021/01/29 12:20:55 UTC

[GitHub] [flink] pnowojski opened a new pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

pnowojski opened a new pull request #14807:
URL: https://github.com/apache/flink/pull/14807


   This PR depends on https://github.com/apache/flink/pull/14797
   
   This PR fixes two bugs. First:
   ```
       If previous checkpoint is declined, it can happen that task receives both older and newer
       checkpoint barrier on two different channels, before processing any checkpoint cancellation
       message/RPC. If the newer checkpoint barrier happens to be processed before the obsolete one
       incorrect `checkState` in ChannelStatePersister would cause job failure. This checkState
       was assuming that the previous checkpoint would have been aborted/stopped before triggering
       the new one, while in reality, this previous checkpoint has never been triggered on this task
       so it also could not have been stopped.
   ```
   Second:
   ```
       This commit fixes a bug where RemoteInputChannel was incorrectly deciding which
       buffers should be spilled, if it has received an obsoleted CheckpointBarrier,
       that hasn't been cancelled (yet?).
   ```
   Both commits are tested by the existing UnalignedCheckpointITCase and some freshly added unit tests.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (**yes** / no / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
     - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-769784165


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fb5b2ecf70c62048073ba8677fe27baa02a63e7c UNKNOWN
   * f5901e6ca1e67fd29980bac95986ff5640a8976c UNKNOWN
   * 3a9e5890fe04ff480f5623bd752ed363a573d58a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-769784165


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fb5b2ecf70c62048073ba8677fe27baa02a63e7c UNKNOWN
   * f5901e6ca1e67fd29980bac95986ff5640a8976c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise closed pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
AHeise closed pull request #14807:
URL: https://github.com/apache/flink/pull/14807


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-769784165


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fb5b2ecf70c62048073ba8677fe27baa02a63e7c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-769784165


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12660",
       "triggerID" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "037bbc7022faf40d87a9cb48b0dbd79587befdc7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12656",
       "triggerID" : "037bbc7022faf40d87a9cb48b0dbd79587befdc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4c752cedc9e008227243ce184670b4f130e98601",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12677",
       "triggerID" : "4c752cedc9e008227243ce184670b4f130e98601",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fb5b2ecf70c62048073ba8677fe27baa02a63e7c UNKNOWN
   * f5901e6ca1e67fd29980bac95986ff5640a8976c UNKNOWN
   * 3a9e5890fe04ff480f5623bd752ed363a573d58a UNKNOWN
   * 037bbc7022faf40d87a9cb48b0dbd79587befdc7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12656) 
   * 4c752cedc9e008227243ce184670b4f130e98601 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12677) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #14807:
URL: https://github.com/apache/flink/pull/14807#discussion_r566953027



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/consumer/ChannelStatePersisterTest.java
##########
@@ -65,38 +83,58 @@ public void testNewBarrierNotOverwrittenByCheckForBarrier() throws IOException {
     }
 
     @Test
-    public void testLateBarrierOnCancelledCheckpoint() throws IOException {
-        ChannelStatePersister persister =
-                new ChannelStatePersister(ChannelStateWriter.NO_OP, new InputChannelInfo(0, 0));
+    public void testLateBarrierOnStartedAndCancelledCheckpoint() throws Exception {
+        testLateBarrier(true, true);
+    }
 
-        persister.startPersisting(1L, Collections.emptyList());
-        // checkpoint aborted
-        persister.stopPersisting(1L);
+    @Test
+    public void testLateBarrierOnCancelledCheckpoint() throws Exception {
+        testLateBarrier(false, true);
+    }
 
-        // late barrier
-        persister.checkForBarrier(barrier(1L));
+    @Test
+    public void testLateBarrierOnNotYetCancelledCheckpoint() throws Exception {
+        testLateBarrier(false, false);
+    }
 
-        persister.startPersisting(2L, Collections.emptyList());
-        persister.checkForBarrier(barrier(2L));
+    public void testLateBarrier(

Review comment:
       nit: private

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java
##########
@@ -638,6 +637,11 @@ private void notifyPriorityEventForce() {
                     CheckpointFailureReason
                             .CHECKPOINT_SUBSUMED); // currently, at most one active unaligned
             // checkpoint is possible
+        } else if (checkpointId > lastBarrierId) {
+            // This channel has received some obsolete barrier, older compared to the checkpointId
+            // which we are processing right now, and we should ignore that obsoleted checkpoint
+            // barrier sequence number.
+            resetLastBarrier();

Review comment:
       I wouldn't have had a side-effect here and just returned an empty list (which I'd also return with side-effect).

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/ChannelStatePersister.java
##########
@@ -67,16 +68,27 @@
         this.channelInfo = checkNotNull(channelInfo);
     }
 
-    protected void startPersisting(long barrierId, List<Buffer> knownBuffers) {
+    protected void startPersisting(long barrierId, List<Buffer> knownBuffers)
+            throws CheckpointException {
         logEvent("startPersisting", barrierId);
-        if (checkpointStatus != CheckpointStatus.BARRIER_RECEIVED && lastSeenBarrier < barrierId) {
+        if (checkpointStatus == CheckpointStatus.BARRIER_RECEIVED && lastSeenBarrier > barrierId) {
+            throw new CheckpointException(

Review comment:
       How does it guarantee that this doesn't cancel a newer checkpoint? I don't see any checkpoint id being bound to the exception.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-769784165


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12660",
       "triggerID" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "037bbc7022faf40d87a9cb48b0dbd79587befdc7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12656",
       "triggerID" : "037bbc7022faf40d87a9cb48b0dbd79587befdc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4c752cedc9e008227243ce184670b4f130e98601",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12677",
       "triggerID" : "4c752cedc9e008227243ce184670b4f130e98601",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fb5b2ecf70c62048073ba8677fe27baa02a63e7c UNKNOWN
   * f5901e6ca1e67fd29980bac95986ff5640a8976c UNKNOWN
   * 3a9e5890fe04ff480f5623bd752ed363a573d58a UNKNOWN
   * 4c752cedc9e008227243ce184670b4f130e98601 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12677) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-769784165


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12660",
       "triggerID" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "037bbc7022faf40d87a9cb48b0dbd79587befdc7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12656",
       "triggerID" : "037bbc7022faf40d87a9cb48b0dbd79587befdc7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fb5b2ecf70c62048073ba8677fe27baa02a63e7c UNKNOWN
   * f5901e6ca1e67fd29980bac95986ff5640a8976c UNKNOWN
   * 3a9e5890fe04ff480f5623bd752ed363a573d58a UNKNOWN
   * 037bbc7022faf40d87a9cb48b0dbd79587befdc7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12656) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-769774406


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit fb5b2ecf70c62048073ba8677fe27baa02a63e7c (Fri Jan 29 12:23:33 UTC 2021)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-769784165


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12660",
       "triggerID" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "037bbc7022faf40d87a9cb48b0dbd79587befdc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "037bbc7022faf40d87a9cb48b0dbd79587befdc7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fb5b2ecf70c62048073ba8677fe27baa02a63e7c UNKNOWN
   * f5901e6ca1e67fd29980bac95986ff5640a8976c UNKNOWN
   * 3a9e5890fe04ff480f5623bd752ed363a573d58a UNKNOWN
   * fac7f50c07fff96245ddcb770aa9ea01b747ef21 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12660) 
   * 037bbc7022faf40d87a9cb48b0dbd79587befdc7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] pnowojski commented on a change in pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
pnowojski commented on a change in pull request #14807:
URL: https://github.com/apache/flink/pull/14807#discussion_r566975267



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/ChannelStatePersister.java
##########
@@ -67,16 +68,27 @@
         this.channelInfo = checkNotNull(channelInfo);
     }
 
-    protected void startPersisting(long barrierId, List<Buffer> knownBuffers) {
+    protected void startPersisting(long barrierId, List<Buffer> knownBuffers)
+            throws CheckpointException {
         logEvent("startPersisting", barrierId);
-        if (checkpointStatus != CheckpointStatus.BARRIER_RECEIVED && lastSeenBarrier < barrierId) {
+        if (checkpointStatus == CheckpointStatus.BARRIER_RECEIVED && lastSeenBarrier > barrierId) {
+            throw new CheckpointException(

Review comment:
       As discussed off-line, it's cancelling the currently triggered exception




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-769774406


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 4c752cedc9e008227243ce184670b4f130e98601 (Fri May 28 08:15:08 UTC 2021)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-769784165


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fb5b2ecf70c62048073ba8677fe27baa02a63e7c UNKNOWN
   * f5901e6ca1e67fd29980bac95986ff5640a8976c UNKNOWN
   * 3a9e5890fe04ff480f5623bd752ed363a573d58a UNKNOWN
   * fac7f50c07fff96245ddcb770aa9ea01b747ef21 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-769784165


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12660",
       "triggerID" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fb5b2ecf70c62048073ba8677fe27baa02a63e7c UNKNOWN
   * f5901e6ca1e67fd29980bac95986ff5640a8976c UNKNOWN
   * 3a9e5890fe04ff480f5623bd752ed363a573d58a UNKNOWN
   * fac7f50c07fff96245ddcb770aa9ea01b747ef21 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12660) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-769784165


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fb5b2ecf70c62048073ba8677fe27baa02a63e7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f5901e6ca1e67fd29980bac95986ff5640a8976c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3a9e5890fe04ff480f5623bd752ed363a573d58a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12660",
       "triggerID" : "fac7f50c07fff96245ddcb770aa9ea01b747ef21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "037bbc7022faf40d87a9cb48b0dbd79587befdc7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12656",
       "triggerID" : "037bbc7022faf40d87a9cb48b0dbd79587befdc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4c752cedc9e008227243ce184670b4f130e98601",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4c752cedc9e008227243ce184670b4f130e98601",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fb5b2ecf70c62048073ba8677fe27baa02a63e7c UNKNOWN
   * f5901e6ca1e67fd29980bac95986ff5640a8976c UNKNOWN
   * 3a9e5890fe04ff480f5623bd752ed363a573d58a UNKNOWN
   * 037bbc7022faf40d87a9cb48b0dbd79587befdc7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12656) 
   * 4c752cedc9e008227243ce184670b4f130e98601 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on pull request #14807: [FLINK-20654][FLINK-21104][network] Fix two bugs in the handling of UnalignedCheckpoints

Posted by GitBox <gi...@apache.org>.
AHeise commented on pull request #14807:
URL: https://github.com/apache/flink/pull/14807#issuecomment-770020842


   Incorporated into #14797 .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org