You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/05/11 07:20:55 UTC

[GitHub] [flink] TsReaper opened a new pull request #12073: [FLINK-14807][table] Add specialized collecting iterator to Blink planner

TsReaper opened a new pull request #12073:
URL: https://github.com/apache/flink/pull/12073


   ## Pre-review note
   
   This PR is waiting for #12037 and #12069 to merge. The first two commits are exactly the same with the blocking PRs. Please start reviewing from the 3rd commit. This PR will be rebased once the blocking PRs are merged.
   
   ## What is the purpose of the change
   
   This PR is part of the [FLINK-14807](https://issues.apache.org/jira/browse/FLINK-14807) which is going to introduce a collecting method for tables. See [here](https://docs.google.com/document/d/13Ata18-e89_hAdfukzEJYreOg2FBZO_Y0RohLDAme6Y/edit) for the whole design document.
   
   This PR introduces two specialized collecting iterator to Blink planner. One iterator aims to deliver the results as fast as possible, while the other focuses on providing exactly-once semantics.
   
   ## Brief change log
   
   - Add specialized collecting iterator to Blink planner
   
   ## Verifying this change
   
   This change can be verified by running the newly added unit tests.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? no
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] KurtYoung merged pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
KurtYoung merged pull request #12073:
URL: https://github.com/apache/flink/pull/12073


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-14807][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bfcf9f36fe1ca742b962367eaf85af68a5c94962 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * e85225100da7e5d3f06934fa331c1712a0ed4354 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289) 
   * 480bb0e56b40f3dadc7e33f96ce417a38ccc4aca Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626520703


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 41d3524e0ba410d80f17f86206191b146d34e460 (Fri Oct 16 10:34:33 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] KurtYoung commented on a change in pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
KurtYoung commented on a change in pull request #12073:
URL: https://github.com/apache/flink/pull/12073#discussion_r426226686



##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {

Review comment:
       change this to `do {} while`, maybe we can reuse these statements with line 110 - line 117?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574",
       "triggerID" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * ca26d6edd7772ee46d24b05c01952a10887eb3f7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #12073: [FLINK-14807][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626520703


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit bfcf9f36fe1ca742b962367eaf85af68a5c94962 (Mon May 11 07:23:49 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-14807][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4f221bd54f57e1d67cf65527de1bee3b0473bdb6 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031) 
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] StephanEwen commented on pull request #12073: [FLINK-17735][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-629691533


   Is the description here correct? The classes do not seem to relate to the Blink planner, but are all in the `flink-streaming-java` module.
   
   I also don't understand how this is related specifically to SQL - isn't this a client side functionality, meaning it would be run as part of the JobClient (and used by the SQL Shell)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-14807][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bfcf9f36fe1ca742b962367eaf85af68a5c94962 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574",
       "triggerID" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588",
       "triggerID" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cc45779cd014b587a2fbed2393683ff4fe73a38b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1600",
       "triggerID" : "cc45779cd014b587a2fbed2393683ff4fe73a38b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "db16ab869545c73c16e0a3bec6c4482cb3331619",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "db16ab869545c73c16e0a3bec6c4482cb3331619",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * 5ff489bbcfdc393ed5835f5d0183d374cd0acb3f UNKNOWN
   * 579236dbe523ccc71e62f4f9becdf0937d3b1bf4 UNKNOWN
   * 4ad199a9f0a8db9ce82493af6e180af602774e29 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588) 
   * cc45779cd014b587a2fbed2393683ff4fe73a38b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1600) 
   * db16ab869545c73c16e0a3bec6c4482cb3331619 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #12073: [FLINK-14807][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bfcf9f36fe1ca742b962367eaf85af68a5c94962 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] KurtYoung commented on a change in pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
KurtYoung commented on a change in pull request #12073:
URL: https://github.com/apache/flink/pull/12073#discussion_r426208132



##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	private JobClient jobClient;
+	private boolean terminated;

Review comment:
       rename to `jobTerminated`

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	private JobClient jobClient;
+	private boolean terminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	@VisibleForTesting

Review comment:
       doesn't see any tests relying on this method

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	private JobClient jobClient;
+	private boolean terminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	@VisibleForTesting
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.terminated = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (terminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				terminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (terminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+		CoordinationRequestGateway gateway = (CoordinationRequestGateway) jobClient;
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {
+		checkJobClientConfigured();
+
+		if (!isJobTerminated()) {
+			jobClient.cancel();
+		}
+	}
+
+	private void sleepBeforeRetry() {
+		if (retryMillis <= 0) {
+			return;
+		}
+
+		try {
+			// TODO a more proper retry strategy?
+			Thread.sleep(retryMillis);
+		} catch (InterruptedException e) {
+			LOG.warn("Interrupted when sleeping before a retry", e);
+		}
+	}
+
+	private void checkJobClientConfigured() {
+		Preconditions.checkNotNull(jobClient, "Job client must be configured before first use.");
+	}
+
+	private class ResultBuffer {
+
+		private static final String INIT_VERSION = "";
+
+		private final LinkedList<T> buffer;
+		private final TypeSerializer<T> serializer;
+
+		private String version;
+		private long offset;
+		private long lastCheckpointedOffset;
+		private long userHead;
+		private long userTail;
+
+		private ResultBuffer(TypeSerializer<T> serializer) {
+			this.buffer = new LinkedList<>();
+			this.serializer = serializer;
+
+			this.version = INIT_VERSION;
+			this.offset = 0;
+			this.lastCheckpointedOffset = 0;
+			this.userHead = 0;
+			this.userTail = 0;
+		}
+
+		private T next() {
+			if (userHead == userTail) {
+				return null;
+			}
+			T ret = buffer.removeFirst();
+			userHead++;
+
+			sanityCheck();
+			return ret;
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response) {
+			dealWithResponse(response, offset);
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response, long responseOffset) {
+			String responseVersion = response.getVersion();
+			long responseLastCheckpointedOffset = response.getLastCheckpointedOffset();
+			List<T> results;
+			try {
+				results = response.getResults(serializer);
+			} catch (IOException e) {
+				LOG.warn("An exception occurs when deserializing query results. Some results might be lost.", e);
+				results = Collections.emptyList();
+			}
+
+			if (responseLastCheckpointedOffset > lastCheckpointedOffset) {

Review comment:
       when will `responseLastCheckpointedOffset < lastCheckpointedOffset`(I can understand when they are equal), and what would you do with this situation?

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	private JobClient jobClient;

Review comment:
       add `@Nullable`

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	private JobClient jobClient;
+	private boolean terminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	@VisibleForTesting
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.terminated = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (terminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				terminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (terminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+		CoordinationRequestGateway gateway = (CoordinationRequestGateway) jobClient;
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {
+		checkJobClientConfigured();
+
+		if (!isJobTerminated()) {
+			jobClient.cancel();
+		}
+	}
+
+	private void sleepBeforeRetry() {
+		if (retryMillis <= 0) {
+			return;
+		}
+
+		try {
+			// TODO a more proper retry strategy?
+			Thread.sleep(retryMillis);
+		} catch (InterruptedException e) {
+			LOG.warn("Interrupted when sleeping before a retry", e);
+		}
+	}
+
+	private void checkJobClientConfigured() {
+		Preconditions.checkNotNull(jobClient, "Job client must be configured before first use.");
+	}
+
+	private class ResultBuffer {
+
+		private static final String INIT_VERSION = "";
+
+		private final LinkedList<T> buffer;
+		private final TypeSerializer<T> serializer;
+
+		private String version;
+		private long offset;
+		private long lastCheckpointedOffset;
+		private long userHead;

Review comment:
       how about `userVisibleHead`

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	private JobClient jobClient;
+	private boolean terminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	@VisibleForTesting
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.terminated = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (terminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				terminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (terminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+		CoordinationRequestGateway gateway = (CoordinationRequestGateway) jobClient;
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {
+		checkJobClientConfigured();
+
+		if (!isJobTerminated()) {
+			jobClient.cancel();
+		}
+	}
+
+	private void sleepBeforeRetry() {
+		if (retryMillis <= 0) {
+			return;
+		}
+
+		try {
+			// TODO a more proper retry strategy?
+			Thread.sleep(retryMillis);
+		} catch (InterruptedException e) {
+			LOG.warn("Interrupted when sleeping before a retry", e);
+		}
+	}
+
+	private void checkJobClientConfigured() {
+		Preconditions.checkNotNull(jobClient, "Job client must be configured before first use.");
+	}
+
+	private class ResultBuffer {
+
+		private static final String INIT_VERSION = "";
+
+		private final LinkedList<T> buffer;
+		private final TypeSerializer<T> serializer;
+
+		private String version;
+		private long offset;
+		private long lastCheckpointedOffset;
+		private long userHead;
+		private long userTail;
+
+		private ResultBuffer(TypeSerializer<T> serializer) {
+			this.buffer = new LinkedList<>();
+			this.serializer = serializer;
+
+			this.version = INIT_VERSION;
+			this.offset = 0;
+			this.lastCheckpointedOffset = 0;
+			this.userHead = 0;
+			this.userTail = 0;
+		}
+
+		private T next() {
+			if (userHead == userTail) {
+				return null;
+			}
+			T ret = buffer.removeFirst();
+			userHead++;
+
+			sanityCheck();
+			return ret;
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response) {
+			dealWithResponse(response, offset);
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response, long responseOffset) {
+			String responseVersion = response.getVersion();
+			long responseLastCheckpointedOffset = response.getLastCheckpointedOffset();
+			List<T> results;
+			try {
+				results = response.getResults(serializer);
+			} catch (IOException e) {

Review comment:
       I don't think we should eat this exception, we should throw this out to show something is going wrong

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	private JobClient jobClient;
+	private boolean terminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	@VisibleForTesting
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.terminated = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;

Review comment:
       use another field to save `CoordinationRequestGateway`, can save some casting in the future

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	private JobClient jobClient;
+	private boolean terminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	@VisibleForTesting
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.terminated = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (terminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				terminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (terminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+		CoordinationRequestGateway gateway = (CoordinationRequestGateway) jobClient;
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {
+		checkJobClientConfigured();
+
+		if (!isJobTerminated()) {
+			jobClient.cancel();
+		}
+	}
+
+	private void sleepBeforeRetry() {
+		if (retryMillis <= 0) {
+			return;
+		}
+
+		try {
+			// TODO a more proper retry strategy?
+			Thread.sleep(retryMillis);
+		} catch (InterruptedException e) {
+			LOG.warn("Interrupted when sleeping before a retry", e);
+		}
+	}
+
+	private void checkJobClientConfigured() {
+		Preconditions.checkNotNull(jobClient, "Job client must be configured before first use.");
+	}
+
+	private class ResultBuffer {
+
+		private static final String INIT_VERSION = "";
+
+		private final LinkedList<T> buffer;
+		private final TypeSerializer<T> serializer;
+
+		private String version;
+		private long offset;
+		private long lastCheckpointedOffset;
+		private long userHead;

Review comment:
       We need some document about these variables and ResultBuffer, I'm quite confused now even if I know the algorithm...

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	private JobClient jobClient;
+	private boolean terminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	@VisibleForTesting
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+

Review comment:
       init `closed` variable




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574",
       "triggerID" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588",
       "triggerID" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cc45779cd014b587a2fbed2393683ff4fe73a38b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1600",
       "triggerID" : "cc45779cd014b587a2fbed2393683ff4fe73a38b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * ca26d6edd7772ee46d24b05c01952a10887eb3f7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574) 
   * 5ff489bbcfdc393ed5835f5d0183d374cd0acb3f UNKNOWN
   * 579236dbe523ccc71e62f4f9becdf0937d3b1bf4 UNKNOWN
   * 4ad199a9f0a8db9ce82493af6e180af602774e29 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588) 
   * cc45779cd014b587a2fbed2393683ff4fe73a38b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1600) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574",
       "triggerID" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * ca26d6edd7772ee46d24b05c01952a10887eb3f7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574) 
   * 5ff489bbcfdc393ed5835f5d0183d374cd0acb3f UNKNOWN
   * 579236dbe523ccc71e62f4f9becdf0937d3b1bf4 UNKNOWN
   * 4ad199a9f0a8db9ce82493af6e180af602774e29 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-14807][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4f221bd54f57e1d67cf65527de1bee3b0473bdb6 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031) 
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * e85225100da7e5d3f06934fa331c1712a0ed4354 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] TsReaper commented on pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
TsReaper commented on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-629774445


   Azure passed in https://dev.azure.com/tsreaper96/Flink/_build/results?buildId=20&view=results


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * e85225100da7e5d3f06934fa331c1712a0ed4354 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289) 
   * 480bb0e56b40f3dadc7e33f96ce417a38ccc4aca Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543) 
   * 7152ce26d8228590b5449280de54d95880bb1e5b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] TsReaper commented on a change in pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
TsReaper commented on a change in pull request #12073:
URL: https://github.com/apache/flink/pull/12073#discussion_r426227712



##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {
+		checkJobClientConfigured();
+
+		if (!isJobTerminated()) {
+			jobClient.cancel();
+		}
+	}
+
+	private void sleepBeforeRetry() {
+		if (retryMillis <= 0) {
+			return;
+		}
+
+		try {
+			// TODO a more proper retry strategy?
+			Thread.sleep(retryMillis);
+		} catch (InterruptedException e) {
+			LOG.warn("Interrupted when sleeping before a retry", e);
+		}
+	}
+
+	private void checkJobClientConfigured() {
+		Preconditions.checkNotNull(jobClient, "Job client must be configured before first use.");
+	}
+
+	/**
+	 * A buffer which encapsulates the logic of dealing with the response from the {@link CollectSinkFunction}.
+	 * See Java doc of {@link CollectSinkFunction} for explanation of this communication protocol.
+	 */
+	private class ResultBuffer {
+
+		private static final String INIT_VERSION = "";
+
+		private final LinkedList<T> buffer;
+		private final TypeSerializer<T> serializer;
+
+		// for detailed explanation of the following 3 variables, see Java doc of CollectSinkFunction
+		// `version` is to check if the sink restarts
+		private String version;
+		// `offset` is the offset of the next result we want to fetch
+		private long offset;
+
+		// userVisibleHead <= user visible results offset < userVisibleTail
+		private long userVisibleHead;
+		private long userVisibleTail;
+
+		private ResultBuffer(TypeSerializer<T> serializer) {
+			this.buffer = new LinkedList<>();
+			this.serializer = serializer;
+
+			this.version = INIT_VERSION;
+			this.offset = 0;
+
+			this.userVisibleHead = 0;
+			this.userVisibleTail = 0;
+		}
+
+		private T next() {
+			if (userVisibleHead == userVisibleTail) {
+				return null;
+			}
+			T ret = buffer.removeFirst();
+			userVisibleHead++;
+
+			sanityCheck();
+			return ret;
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response) {
+			dealWithResponse(response, offset);
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response, long responseOffset) {
+			String responseVersion = response.getVersion();
+			long responseLastCheckpointedOffset = response.getLastCheckpointedOffset();
+			List<T> results;
+			try {
+				results = response.getResults(serializer);
+			} catch (IOException e) {
+				throw new RuntimeException(e);
+			}
+
+			// we first check version in the response to decide whether we should throw away dirty results
+			if (!version.equals(responseVersion)) {
+				// sink restarted, we revert back to where the sink tells us
+				for (long i = 0; i < offset - responseLastCheckpointedOffset; i++) {
+					buffer.removeLast();
+				}
+				version = responseVersion;
+				offset = responseLastCheckpointedOffset;
+			}
+
+			// we now check if more results can be seen by the user
+			if (responseLastCheckpointedOffset > userVisibleTail) {
+				// lastCheckpointedOffset increases, this means that more results have been
+				// checkpointed, and we can give these results to the user
+				userVisibleTail = responseLastCheckpointedOffset;
+			}
+
+			if (!results.isEmpty()) {
+				// response contains some data, add them to buffer
+				int addStart = (int) (offset - responseOffset);
+				List<T> addedResults = results.subList(addStart, results.size());

Review comment:
       This is impossible. Because if this happens then the sink must have restarted, and we must have gone back to the last checkpointed offset. But I'll add a sanity check here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-14807][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * e85225100da7e5d3f06934fa331c1712a0ed4354 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] TsReaper commented on a change in pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
TsReaper commented on a change in pull request #12073:
URL: https://github.com/apache/flink/pull/12073#discussion_r426216888



##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	private JobClient jobClient;
+	private boolean terminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	@VisibleForTesting
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.terminated = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (terminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				terminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (terminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+		CoordinationRequestGateway gateway = (CoordinationRequestGateway) jobClient;
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {
+		checkJobClientConfigured();
+
+		if (!isJobTerminated()) {
+			jobClient.cancel();
+		}
+	}
+
+	private void sleepBeforeRetry() {
+		if (retryMillis <= 0) {
+			return;
+		}
+
+		try {
+			// TODO a more proper retry strategy?
+			Thread.sleep(retryMillis);
+		} catch (InterruptedException e) {
+			LOG.warn("Interrupted when sleeping before a retry", e);
+		}
+	}
+
+	private void checkJobClientConfigured() {
+		Preconditions.checkNotNull(jobClient, "Job client must be configured before first use.");
+	}
+
+	private class ResultBuffer {
+
+		private static final String INIT_VERSION = "";
+
+		private final LinkedList<T> buffer;
+		private final TypeSerializer<T> serializer;
+
+		private String version;
+		private long offset;
+		private long lastCheckpointedOffset;
+		private long userHead;
+		private long userTail;
+
+		private ResultBuffer(TypeSerializer<T> serializer) {
+			this.buffer = new LinkedList<>();
+			this.serializer = serializer;
+
+			this.version = INIT_VERSION;
+			this.offset = 0;
+			this.lastCheckpointedOffset = 0;
+			this.userHead = 0;
+			this.userTail = 0;
+		}
+
+		private T next() {
+			if (userHead == userTail) {
+				return null;
+			}
+			T ret = buffer.removeFirst();
+			userHead++;
+
+			sanityCheck();
+			return ret;
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response) {
+			dealWithResponse(response, offset);
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response, long responseOffset) {
+			String responseVersion = response.getVersion();
+			long responseLastCheckpointedOffset = response.getLastCheckpointedOffset();
+			List<T> results;
+			try {
+				results = response.getResults(serializer);
+			} catch (IOException e) {
+				LOG.warn("An exception occurs when deserializing query results. Some results might be lost.", e);
+				results = Collections.emptyList();
+			}
+
+			if (responseLastCheckpointedOffset > lastCheckpointedOffset) {

Review comment:
       It is impossible for `responseLastCheckpointedOffset` to be smaller than `lastCheckpointedOffset`, offsets are always non-decreasing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] KurtYoung commented on a change in pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
KurtYoung commented on a change in pull request #12073:
URL: https://github.com/apache/flink/pull/12073#discussion_r426248082



##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,358 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+
+		// in case that user neither reads all data nor closes the iterator
+		// this is only an insurance,
+		// it's the user's responsibility to close the iterator if he does not need it anymore
+		Runtime.getRuntime().addShutdownHook(new Thread(this::close));
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		// this is to avoid sleeping before first try
+		boolean beforeFirstTry = true;
+		do {
+			T res = buffer.next();
+			if (res != null) {
+				// we still have user-visible results, just use them
+				return res;
+			} else if (jobTerminated) {
+				// no user-visible results, but job has terminated, we have to return
+				return null;
+			} else if (!beforeFirstTry) {
+				// no results but job is still running, sleep before retry
+				sleepBeforeRetry();
+			}
+			beforeFirstTry = false;
+
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				try {
+					Tuple2<Long, CollectCoordinationResponse<T>> accResults = getAccumulatorResults();
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				} catch (IOException e) {
+					close();
+					throw new RuntimeException(

Review comment:
       how about throw IOException instead?

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,358 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+
+		// in case that user neither reads all data nor closes the iterator
+		// this is only an insurance,
+		// it's the user's responsibility to close the iterator if he does not need it anymore
+		Runtime.getRuntime().addShutdownHook(new Thread(this::close));
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		// this is to avoid sleeping before first try
+		boolean beforeFirstTry = true;
+		do {
+			T res = buffer.next();
+			if (res != null) {
+				// we still have user-visible results, just use them
+				return res;
+			} else if (jobTerminated) {
+				// no user-visible results, but job has terminated, we have to return
+				return null;
+			} else if (!beforeFirstTry) {
+				// no results but job is still running, sleep before retry
+				sleepBeforeRetry();
+			}
+			beforeFirstTry = false;
+
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				try {
+					Tuple2<Long, CollectCoordinationResponse<T>> accResults = getAccumulatorResults();
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				} catch (IOException e) {
+					close();
+					throw new RuntimeException(
+						"Failed to deal with final accumulator results, final batch of results are lost", e);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				long requestOffset = buffer.offset;
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, requestOffset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					continue;
+				}
+				// the response will contain data (if any) starting exactly from requested offset
+				try {
+					buffer.dealWithResponse(response, requestOffset);
+				} catch (IOException e) {
+					close();
+					throw new RuntimeException("Failed to deal with response from sink", e);

Review comment:
       ditto




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * 7152ce26d8228590b5449280de54d95880bb1e5b Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574",
       "triggerID" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * ca26d6edd7772ee46d24b05c01952a10887eb3f7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574) 
   * 5ff489bbcfdc393ed5835f5d0183d374cd0acb3f UNKNOWN
   * 579236dbe523ccc71e62f4f9becdf0937d3b1bf4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] TsReaper commented on a change in pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
TsReaper commented on a change in pull request #12073:
URL: https://github.com/apache/flink/pull/12073#discussion_r426230954



##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {

Review comment:
       This will cause iterator to sleep before the first try, which is undesired.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574",
       "triggerID" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588",
       "triggerID" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * ca26d6edd7772ee46d24b05c01952a10887eb3f7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574) 
   * 5ff489bbcfdc393ed5835f5d0183d374cd0acb3f UNKNOWN
   * 579236dbe523ccc71e62f4f9becdf0937d3b1bf4 UNKNOWN
   * 4ad199a9f0a8db9ce82493af6e180af602774e29 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] TsReaper commented on pull request #12073: [FLINK-17735][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
TsReaper commented on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-629730294


   > Is the description here correct? The classes do not seem to relate to the Blink planner, but are all in the `flink-streaming-java` module.
   > 
   > I also don't understand how this is related specifically to SQL - isn't this a client side functionality, meaning it would be run as part of the JobClient (and used by the SQL Shell)?
   
   We initially would like to implement an iterator which will spill large data onto disks on client side. Currently only `ResettableExternalBuffer` in Blink planner can achieve this conveniently. But after an offline discussion with @KurtYoung yesterday, we decided to first simplify the implementation to a memory-only version. So this iterator is not related to Blink planner now. I'll update the description.
   
   The iterator is not running in `JobClient` but actually uses `JobClient`. It is indeed runs in the client side and used by the users.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] TsReaper commented on a change in pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
TsReaper commented on a change in pull request #12073:
URL: https://github.com/apache/flink/pull/12073#discussion_r426229597



##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {
+		checkJobClientConfigured();
+
+		if (!isJobTerminated()) {
+			jobClient.cancel();
+		}
+	}
+
+	private void sleepBeforeRetry() {
+		if (retryMillis <= 0) {
+			return;
+		}
+
+		try {
+			// TODO a more proper retry strategy?
+			Thread.sleep(retryMillis);
+		} catch (InterruptedException e) {
+			LOG.warn("Interrupted when sleeping before a retry", e);
+		}
+	}
+
+	private void checkJobClientConfigured() {
+		Preconditions.checkNotNull(jobClient, "Job client must be configured before first use.");
+	}
+
+	/**
+	 * A buffer which encapsulates the logic of dealing with the response from the {@link CollectSinkFunction}.
+	 * See Java doc of {@link CollectSinkFunction} for explanation of this communication protocol.
+	 */
+	private class ResultBuffer {
+
+		private static final String INIT_VERSION = "";
+
+		private final LinkedList<T> buffer;
+		private final TypeSerializer<T> serializer;
+
+		// for detailed explanation of the following 3 variables, see Java doc of CollectSinkFunction
+		// `version` is to check if the sink restarts
+		private String version;
+		// `offset` is the offset of the next result we want to fetch
+		private long offset;
+
+		// userVisibleHead <= user visible results offset < userVisibleTail
+		private long userVisibleHead;
+		private long userVisibleTail;
+
+		private ResultBuffer(TypeSerializer<T> serializer) {
+			this.buffer = new LinkedList<>();
+			this.serializer = serializer;
+
+			this.version = INIT_VERSION;
+			this.offset = 0;
+
+			this.userVisibleHead = 0;
+			this.userVisibleTail = 0;
+		}
+
+		private T next() {
+			if (userVisibleHead == userVisibleTail) {
+				return null;
+			}
+			T ret = buffer.removeFirst();
+			userVisibleHead++;
+
+			sanityCheck();
+			return ret;
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response) {
+			dealWithResponse(response, offset);
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response, long responseOffset) {
+			String responseVersion = response.getVersion();
+			long responseLastCheckpointedOffset = response.getLastCheckpointedOffset();
+			List<T> results;
+			try {
+				results = response.getResults(serializer);
+			} catch (IOException e) {
+				throw new RuntimeException(e);
+			}
+
+			// we first check version in the response to decide whether we should throw away dirty results
+			if (!version.equals(responseVersion)) {
+				// sink restarted, we revert back to where the sink tells us
+				for (long i = 0; i < offset - responseLastCheckpointedOffset; i++) {

Review comment:
       "less than offset" should be "less than or equal to offset", because sink may have restarted before client fetches more results.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * e85225100da7e5d3f06934fa331c1712a0ed4354 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289) 
   * 480bb0e56b40f3dadc7e33f96ce417a38ccc4aca UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574",
       "triggerID" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588",
       "triggerID" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cc45779cd014b587a2fbed2393683ff4fe73a38b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1600",
       "triggerID" : "cc45779cd014b587a2fbed2393683ff4fe73a38b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * 5ff489bbcfdc393ed5835f5d0183d374cd0acb3f UNKNOWN
   * 579236dbe523ccc71e62f4f9becdf0937d3b1bf4 UNKNOWN
   * 4ad199a9f0a8db9ce82493af6e180af602774e29 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588) 
   * cc45779cd014b587a2fbed2393683ff4fe73a38b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1600) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * e85225100da7e5d3f06934fa331c1712a0ed4354 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289) 
   * 480bb0e56b40f3dadc7e33f96ce417a38ccc4aca Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543) 
   * 7152ce26d8228590b5449280de54d95880bb1e5b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574",
       "triggerID" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588",
       "triggerID" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cc45779cd014b587a2fbed2393683ff4fe73a38b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1600",
       "triggerID" : "cc45779cd014b587a2fbed2393683ff4fe73a38b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "db16ab869545c73c16e0a3bec6c4482cb3331619",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1613",
       "triggerID" : "db16ab869545c73c16e0a3bec6c4482cb3331619",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * 5ff489bbcfdc393ed5835f5d0183d374cd0acb3f UNKNOWN
   * 579236dbe523ccc71e62f4f9becdf0937d3b1bf4 UNKNOWN
   * 4ad199a9f0a8db9ce82493af6e180af602774e29 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588) 
   * cc45779cd014b587a2fbed2393683ff4fe73a38b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1600) 
   * db16ab869545c73c16e0a3bec6c4482cb3331619 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1613) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] KurtYoung commented on a change in pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
KurtYoung commented on a change in pull request #12073:
URL: https://github.com/apache/flink/pull/12073#discussion_r426223511



##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);

Review comment:
       ```suggestion
   				long requestOffset = buffer.offset;
   				CollectCoordinationResponse<T> response;
   				try {
   					response = sendRequest(buffer.version, requestOffset);
   				} catch (Exception e) {
   					LOG.warn("An exception occurs when fetching query results", e);
   					sleepBeforeRetry();
   					continue;
   				}
   				// the response will contain data (if any) starting exactly from requested offset
   				buffer.dealWithResponse(response, requestOffset);
   ```

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	private JobClient jobClient;
+	private boolean terminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	@VisibleForTesting
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.terminated = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (terminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				terminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (terminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+		CoordinationRequestGateway gateway = (CoordinationRequestGateway) jobClient;
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {
+		checkJobClientConfigured();
+
+		if (!isJobTerminated()) {
+			jobClient.cancel();
+		}
+	}
+
+	private void sleepBeforeRetry() {
+		if (retryMillis <= 0) {
+			return;
+		}
+
+		try {
+			// TODO a more proper retry strategy?
+			Thread.sleep(retryMillis);
+		} catch (InterruptedException e) {
+			LOG.warn("Interrupted when sleeping before a retry", e);
+		}
+	}
+
+	private void checkJobClientConfigured() {
+		Preconditions.checkNotNull(jobClient, "Job client must be configured before first use.");
+	}
+
+	private class ResultBuffer {
+
+		private static final String INIT_VERSION = "";
+
+		private final LinkedList<T> buffer;
+		private final TypeSerializer<T> serializer;
+
+		private String version;
+		private long offset;
+		private long lastCheckpointedOffset;
+		private long userHead;
+		private long userTail;
+
+		private ResultBuffer(TypeSerializer<T> serializer) {
+			this.buffer = new LinkedList<>();
+			this.serializer = serializer;
+
+			this.version = INIT_VERSION;
+			this.offset = 0;
+			this.lastCheckpointedOffset = 0;
+			this.userHead = 0;
+			this.userTail = 0;
+		}
+
+		private T next() {
+			if (userHead == userTail) {
+				return null;
+			}
+			T ret = buffer.removeFirst();
+			userHead++;
+
+			sanityCheck();
+			return ret;
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response) {
+			dealWithResponse(response, offset);
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response, long responseOffset) {
+			String responseVersion = response.getVersion();
+			long responseLastCheckpointedOffset = response.getLastCheckpointedOffset();
+			List<T> results;
+			try {
+				results = response.getResults(serializer);
+			} catch (IOException e) {

Review comment:
       I would prefer to throw it out

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {
+		checkJobClientConfigured();
+
+		if (!isJobTerminated()) {
+			jobClient.cancel();
+		}
+	}
+
+	private void sleepBeforeRetry() {
+		if (retryMillis <= 0) {
+			return;
+		}
+
+		try {
+			// TODO a more proper retry strategy?
+			Thread.sleep(retryMillis);
+		} catch (InterruptedException e) {
+			LOG.warn("Interrupted when sleeping before a retry", e);
+		}
+	}
+
+	private void checkJobClientConfigured() {
+		Preconditions.checkNotNull(jobClient, "Job client must be configured before first use.");
+	}
+
+	/**
+	 * A buffer which encapsulates the logic of dealing with the response from the {@link CollectSinkFunction}.
+	 * See Java doc of {@link CollectSinkFunction} for explanation of this communication protocol.
+	 */
+	private class ResultBuffer {
+
+		private static final String INIT_VERSION = "";
+
+		private final LinkedList<T> buffer;
+		private final TypeSerializer<T> serializer;
+
+		// for detailed explanation of the following 3 variables, see Java doc of CollectSinkFunction
+		// `version` is to check if the sink restarts
+		private String version;
+		// `offset` is the offset of the next result we want to fetch
+		private long offset;
+
+		// userVisibleHead <= user visible results offset < userVisibleTail
+		private long userVisibleHead;
+		private long userVisibleTail;
+
+		private ResultBuffer(TypeSerializer<T> serializer) {
+			this.buffer = new LinkedList<>();
+			this.serializer = serializer;
+
+			this.version = INIT_VERSION;
+			this.offset = 0;
+
+			this.userVisibleHead = 0;
+			this.userVisibleTail = 0;
+		}
+
+		private T next() {
+			if (userVisibleHead == userVisibleTail) {
+				return null;
+			}
+			T ret = buffer.removeFirst();
+			userVisibleHead++;
+
+			sanityCheck();
+			return ret;
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response) {
+			dealWithResponse(response, offset);
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response, long responseOffset) {
+			String responseVersion = response.getVersion();
+			long responseLastCheckpointedOffset = response.getLastCheckpointedOffset();
+			List<T> results;
+			try {
+				results = response.getResults(serializer);
+			} catch (IOException e) {
+				throw new RuntimeException(e);
+			}
+
+			// we first check version in the response to decide whether we should throw away dirty results
+			if (!version.equals(responseVersion)) {
+				// sink restarted, we revert back to where the sink tells us
+				for (long i = 0; i < offset - responseLastCheckpointedOffset; i++) {

Review comment:
       add a sanity check that `responseLastCheckpointedOffset` is less than offset. 
   and also checks that this buffer still contains data starting from `responseLastCheckpointedOffset`

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {
+		checkJobClientConfigured();
+
+		if (!isJobTerminated()) {
+			jobClient.cancel();
+		}
+	}
+
+	private void sleepBeforeRetry() {
+		if (retryMillis <= 0) {
+			return;
+		}
+
+		try {
+			// TODO a more proper retry strategy?
+			Thread.sleep(retryMillis);
+		} catch (InterruptedException e) {
+			LOG.warn("Interrupted when sleeping before a retry", e);
+		}
+	}
+
+	private void checkJobClientConfigured() {
+		Preconditions.checkNotNull(jobClient, "Job client must be configured before first use.");
+	}
+
+	/**
+	 * A buffer which encapsulates the logic of dealing with the response from the {@link CollectSinkFunction}.
+	 * See Java doc of {@link CollectSinkFunction} for explanation of this communication protocol.
+	 */
+	private class ResultBuffer {
+
+		private static final String INIT_VERSION = "";
+
+		private final LinkedList<T> buffer;
+		private final TypeSerializer<T> serializer;
+
+		// for detailed explanation of the following 3 variables, see Java doc of CollectSinkFunction
+		// `version` is to check if the sink restarts
+		private String version;
+		// `offset` is the offset of the next result we want to fetch
+		private long offset;
+
+		// userVisibleHead <= user visible results offset < userVisibleTail
+		private long userVisibleHead;
+		private long userVisibleTail;
+
+		private ResultBuffer(TypeSerializer<T> serializer) {
+			this.buffer = new LinkedList<>();
+			this.serializer = serializer;
+
+			this.version = INIT_VERSION;
+			this.offset = 0;
+
+			this.userVisibleHead = 0;
+			this.userVisibleTail = 0;
+		}
+
+		private T next() {
+			if (userVisibleHead == userVisibleTail) {
+				return null;
+			}
+			T ret = buffer.removeFirst();
+			userVisibleHead++;
+
+			sanityCheck();
+			return ret;
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response) {
+			dealWithResponse(response, offset);
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response, long responseOffset) {
+			String responseVersion = response.getVersion();
+			long responseLastCheckpointedOffset = response.getLastCheckpointedOffset();
+			List<T> results;
+			try {
+				results = response.getResults(serializer);
+			} catch (IOException e) {
+				throw new RuntimeException(e);
+			}
+
+			// we first check version in the response to decide whether we should throw away dirty results
+			if (!version.equals(responseVersion)) {
+				// sink restarted, we revert back to where the sink tells us
+				for (long i = 0; i < offset - responseLastCheckpointedOffset; i++) {
+					buffer.removeLast();
+				}
+				version = responseVersion;
+				offset = responseLastCheckpointedOffset;
+			}
+
+			// we now check if more results can be seen by the user
+			if (responseLastCheckpointedOffset > userVisibleTail) {
+				// lastCheckpointedOffset increases, this means that more results have been
+				// checkpointed, and we can give these results to the user
+				userVisibleTail = responseLastCheckpointedOffset;
+			}
+
+			if (!results.isEmpty()) {
+				// response contains some data, add them to buffer
+				int addStart = (int) (offset - responseOffset);
+				List<T> addedResults = results.subList(addStart, results.size());

Review comment:
       what about if `responseOffset + results.size()` still less than offset, which means you get a fully duplicated data?

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {
+		checkJobClientConfigured();
+
+		if (!isJobTerminated()) {
+			jobClient.cancel();
+		}
+	}
+
+	private void sleepBeforeRetry() {
+		if (retryMillis <= 0) {
+			return;
+		}
+
+		try {
+			// TODO a more proper retry strategy?
+			Thread.sleep(retryMillis);
+		} catch (InterruptedException e) {
+			LOG.warn("Interrupted when sleeping before a retry", e);
+		}
+	}
+
+	private void checkJobClientConfigured() {
+		Preconditions.checkNotNull(jobClient, "Job client must be configured before first use.");
+	}
+
+	/**
+	 * A buffer which encapsulates the logic of dealing with the response from the {@link CollectSinkFunction}.
+	 * See Java doc of {@link CollectSinkFunction} for explanation of this communication protocol.
+	 */
+	private class ResultBuffer {
+
+		private static final String INIT_VERSION = "";
+
+		private final LinkedList<T> buffer;
+		private final TypeSerializer<T> serializer;
+
+		// for detailed explanation of the following 3 variables, see Java doc of CollectSinkFunction
+		// `version` is to check if the sink restarts
+		private String version;
+		// `offset` is the offset of the next result we want to fetch
+		private long offset;
+
+		// userVisibleHead <= user visible results offset < userVisibleTail
+		private long userVisibleHead;
+		private long userVisibleTail;
+
+		private ResultBuffer(TypeSerializer<T> serializer) {
+			this.buffer = new LinkedList<>();
+			this.serializer = serializer;
+
+			this.version = INIT_VERSION;
+			this.offset = 0;
+
+			this.userVisibleHead = 0;
+			this.userVisibleTail = 0;
+		}
+
+		private T next() {
+			if (userVisibleHead == userVisibleTail) {
+				return null;
+			}
+			T ret = buffer.removeFirst();
+			userVisibleHead++;
+
+			sanityCheck();
+			return ret;
+		}
+
+		private void dealWithResponse(CollectCoordinationResponse<T> response) {

Review comment:
       delete this method

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);

Review comment:
       throw IOException istead

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally

Review comment:
       should let user know this is a abnormal case, throw an exception?

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {

Review comment:
       change this to `do {} while`, maybe we can reuse these statements with #110 - #117?

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {

Review comment:
       ```suggestion
   	private Tuple2<Long, CollectCoordinationResponse<T>> getAccumulatorResults() {
   ```

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {

Review comment:
       Before we throw any exception, we should also try to cancel the job

##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {
+				// still no results, but job is still running, retry
+				sleepBeforeRetry();
+			}
+		}
+	}
+
+	public void close() {
+		if (closed) {
+			return;
+		}
+
+		cancelJob();
+		closed = true;
+	}
+
+	@Override
+	protected void finalize() throws Throwable {
+		// in case that user neither reads all data nor closes the iterator
+		close();
+	}
+
+	@SuppressWarnings("unchecked")
+	private CollectCoordinationResponse<T> sendRequest(
+			String version,
+			long offset) throws InterruptedException, ExecutionException {
+		checkJobClientConfigured();
+
+		OperatorID operatorId = operatorIdFuture.getNow(null);
+		Preconditions.checkNotNull(operatorId, "Unknown operator ID. This is a bug.");
+
+		CollectCoordinationRequest request = new CollectCoordinationRequest(version, offset);
+		return (CollectCoordinationResponse) gateway.sendCoordinationRequest(operatorId, request).get();
+	}
+
+	@Nullable
+	private Tuple2<Long, CollectCoordinationResponse> getAccumulatorResults() {
+		checkJobClientConfigured();
+
+		JobExecutionResult executionResult;
+		try {
+			// this timeout is sort of hack, see comments in isJobTerminated for explanation
+			executionResult = jobClient.getJobExecutionResult(getClass().getClassLoader()).get(
+				DEFAULT_ACCUMULATOR_GET_MILLIS, TimeUnit.MILLISECONDS);
+		} catch (InterruptedException | ExecutionException | TimeoutException e) {
+			throw new RuntimeException("Failed to fetch job execution result", e);
+		}
+
+		ArrayList<byte[]> accResults = executionResult.getAccumulatorResult(accumulatorName);
+		if (accResults == null) {
+			// job terminates abnormally
+			return null;
+		}
+
+		try {
+			List<byte[]> serializedResults =
+				SerializedListAccumulator.deserializeList(accResults, BytePrimitiveArraySerializer.INSTANCE);
+			byte[] serializedResult = serializedResults.get(0);
+			return CollectSinkFunction.deserializeAccumulatorResult(serializedResult);
+		} catch (ClassNotFoundException | IOException e) {
+			// this is impossible
+			throw new RuntimeException("Failed to deserialize accumulator results", e);
+		}
+	}
+
+	private boolean isJobTerminated() {
+		checkJobClientConfigured();
+
+		try {
+			JobStatus status = jobClient.getJobStatus().get();
+			return status.isGloballyTerminalState();
+		} catch (Exception e) {
+			// TODO
+			//  This is sort of hack.
+			//  Currently different execution environment will have different behaviors
+			//  when fetching a finished job status.
+			//  For example, standalone session cluster will return a normal FINISHED,
+			//  while mini cluster will throw IllegalStateException,
+			//  and yarn per job will throw ApplicationNotFoundException.
+			//  We have to assume that job has finished in this case.
+			//  Change this when these behaviors are unified.
+			LOG.warn("Failed to get job status so we assume that the job has terminated. Some data might be lost.", e);
+			return true;
+		}
+	}
+
+	private void cancelJob() {
+		checkJobClientConfigured();
+
+		if (!isJobTerminated()) {
+			jobClient.cancel();
+		}
+	}
+
+	private void sleepBeforeRetry() {
+		if (retryMillis <= 0) {
+			return;
+		}
+
+		try {
+			// TODO a more proper retry strategy?
+			Thread.sleep(retryMillis);
+		} catch (InterruptedException e) {
+			LOG.warn("Interrupted when sleeping before a retry", e);
+		}
+	}
+
+	private void checkJobClientConfigured() {
+		Preconditions.checkNotNull(jobClient, "Job client must be configured before first use.");

Review comment:
       also check the gateway




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574",
       "triggerID" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588",
       "triggerID" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cc45779cd014b587a2fbed2393683ff4fe73a38b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cc45779cd014b587a2fbed2393683ff4fe73a38b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * ca26d6edd7772ee46d24b05c01952a10887eb3f7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574) 
   * 5ff489bbcfdc393ed5835f5d0183d374cd0acb3f UNKNOWN
   * 579236dbe523ccc71e62f4f9becdf0937d3b1bf4 UNKNOWN
   * 4ad199a9f0a8db9ce82493af6e180af602774e29 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588) 
   * cc45779cd014b587a2fbed2393683ff4fe73a38b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-14807][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4f221bd54f57e1d67cf65527de1bee3b0473bdb6 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031) 
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * e85225100da7e5d3f06934fa331c1712a0ed4354 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574",
       "triggerID" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "579236dbe523ccc71e62f4f9becdf0937d3b1bf4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588",
       "triggerID" : "4ad199a9f0a8db9ce82493af6e180af602774e29",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cc45779cd014b587a2fbed2393683ff4fe73a38b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1600",
       "triggerID" : "cc45779cd014b587a2fbed2393683ff4fe73a38b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "db16ab869545c73c16e0a3bec6c4482cb3331619",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1613",
       "triggerID" : "db16ab869545c73c16e0a3bec6c4482cb3331619",
       "triggerType" : "PUSH"
     }, {
       "hash" : "41d3524e0ba410d80f17f86206191b146d34e460",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "41d3524e0ba410d80f17f86206191b146d34e460",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * 5ff489bbcfdc393ed5835f5d0183d374cd0acb3f UNKNOWN
   * 579236dbe523ccc71e62f4f9becdf0937d3b1bf4 UNKNOWN
   * 4ad199a9f0a8db9ce82493af6e180af602774e29 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1588) 
   * cc45779cd014b587a2fbed2393683ff4fe73a38b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1600) 
   * db16ab869545c73c16e0a3bec6c4482cb3331619 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1613) 
   * 41d3524e0ba410d80f17f86206191b146d34e460 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574",
       "triggerID" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5ff489bbcfdc393ed5835f5d0183d374cd0acb3f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * ca26d6edd7772ee46d24b05c01952a10887eb3f7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1574) 
   * 5ff489bbcfdc393ed5835f5d0183d374cd0acb3f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] TsReaper commented on a change in pull request #12073: [FLINK-17735][streaming] Add specialized collecting iterator

Posted by GitBox <gi...@apache.org>.
TsReaper commented on a change in pull request #12073:
URL: https://github.com/apache/flink/pull/12073#discussion_r426230954



##########
File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/CollectResultFetcher.java
##########
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators.collect;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.accumulators.SerializedListAccumulator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.execution.JobClient;
+import org.apache.flink.runtime.jobgraph.OperatorID;
+import org.apache.flink.runtime.operators.coordination.CoordinationRequestGateway;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * A fetcher which fetches query results from sink and provides exactly-once semantics.
+ */
+public class CollectResultFetcher<T> {
+
+	private static final int DEFAULT_RETRY_MILLIS = 100;
+	private static final long DEFAULT_ACCUMULATOR_GET_MILLIS = 10000;
+
+	private static final Logger LOG = LoggerFactory.getLogger(CollectResultFetcher.class);
+
+	private final CompletableFuture<OperatorID> operatorIdFuture;
+	private final String accumulatorName;
+	private final int retryMillis;
+
+	private ResultBuffer buffer;
+
+	@Nullable
+	private JobClient jobClient;
+	@Nullable
+	private CoordinationRequestGateway gateway;
+
+	private boolean jobTerminated;
+	private boolean closed;
+
+	public CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName) {
+		this(
+			operatorIdFuture,
+			serializer,
+			accumulatorName,
+			DEFAULT_RETRY_MILLIS);
+	}
+
+	CollectResultFetcher(
+			CompletableFuture<OperatorID> operatorIdFuture,
+			TypeSerializer<T> serializer,
+			String accumulatorName,
+			int retryMillis) {
+		this.operatorIdFuture = operatorIdFuture;
+		this.accumulatorName = accumulatorName;
+		this.retryMillis = retryMillis;
+
+		this.buffer = new ResultBuffer(serializer);
+
+		this.jobTerminated = false;
+		this.closed = false;
+	}
+
+	public void setJobClient(JobClient jobClient) {
+		Preconditions.checkArgument(
+			jobClient instanceof CoordinationRequestGateway,
+			"Job client must be a CoordinationRequestGateway. This is a bug.");
+		this.jobClient = jobClient;
+		this.gateway = (CoordinationRequestGateway) jobClient;
+	}
+
+	@SuppressWarnings("unchecked")
+	public T next() {
+		if (closed) {
+			return null;
+		}
+
+		T res = buffer.next();
+		if (res != null) {
+			// we still have user-visible results, just use them
+			return res;
+		} else if (jobTerminated) {
+			// no user-visible results, but job has terminated, we have to return
+			return null;
+		}
+
+		// we're going to fetch some more
+		while (true) {
+			if (isJobTerminated()) {
+				// job terminated, read results from accumulator
+				jobTerminated = true;
+				Tuple2<Long, CollectCoordinationResponse> accResults = getAccumulatorResults();
+				if (accResults != null) {
+					buffer.dealWithResponse(accResults.f1, accResults.f0);
+				}
+				buffer.complete();
+			} else {
+				// job still running, try to fetch some results
+				CollectCoordinationResponse<T> response;
+				try {
+					response = sendRequest(buffer.version, buffer.offset);
+				} catch (Exception e) {
+					LOG.warn("An exception occurs when fetching query results", e);
+					sleepBeforeRetry();
+					continue;
+				}
+				buffer.dealWithResponse(response);
+			}
+
+			// try to return results after fetching
+			res = buffer.next();
+			if (res != null) {
+				// ok, we have results this time
+				return res;
+			} else if (jobTerminated) {
+				// still no results, but job has terminated, we have to return
+				return null;
+			} else {

Review comment:
       This will cause iterator to sleep before the first try, which is undesired.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12073: [FLINK-17735][table] Add specialized collecting iterator to Blink planner

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12073:
URL: https://github.com/apache/flink/pull/12073#issuecomment-626524463


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=954",
       "triggerID" : "bfcf9f36fe1ca742b962367eaf85af68a5c94962",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1031",
       "triggerID" : "4f221bd54f57e1d67cf65527de1bee3b0473bdb6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1289",
       "triggerID" : "e85225100da7e5d3f06934fa331c1712a0ed4354",
       "triggerType" : "PUSH"
     }, {
       "hash" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1543",
       "triggerID" : "480bb0e56b40f3dadc7e33f96ce417a38ccc4aca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565",
       "triggerID" : "7152ce26d8228590b5449280de54d95880bb1e5b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ca26d6edd7772ee46d24b05c01952a10887eb3f7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 424d752d1d5e49c752ccd79561ce5cfcd5ea7d1d UNKNOWN
   * 7152ce26d8228590b5449280de54d95880bb1e5b Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1565) 
   * ca26d6edd7772ee46d24b05c01952a10887eb3f7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org