You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2018/11/20 19:58:38 UTC

[GitHub] tweise commented on a change in pull request #6980: [FLINK-5697] [kinesis] Add periodic per-shard watermark support

tweise commented on a change in pull request #6980: [FLINK-5697] [kinesis] Add periodic per-shard watermark support
URL: https://github.com/apache/flink/pull/6980#discussion_r235149070
 
 

 ##########
 File path: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/KinesisDataFetcher.java
 ##########
 @@ -609,7 +667,115 @@ public int registerNewSubscribedShardState(KinesisStreamShardState newSubscribed
 				this.numberOfActiveShards.incrementAndGet();
 			}
 
-			return subscribedShardsState.size() - 1;
+			int shardStateIndex = subscribedShardsState.size() - 1;
+
+			// track all discovered shards for watermark determination
+			ShardWatermarkState sws = shardWatermarks.get(shardStateIndex);
+			if (sws == null) {
+				sws = new ShardWatermarkState();
+				try {
+					sws.periodicWatermarkAssigner = InstantiationUtil.clone(periodicWatermarkAssigner);
+				} catch (Exception e) {
+					throw new RuntimeException(e);
+				}
+				sws.lastUpdated = getCurrentTimeMillis();
+				sws.lastRecordTimestamp = Long.MIN_VALUE;
+				shardWatermarks.put(shardStateIndex, sws);
+			}
+
+			return shardStateIndex;
+		}
+	}
+
+	/**
+	 * Return the current system time. Allow tests to override this to simulate progress for watermark
+	 * logic.
+	 *
+	 * @return
+	 */
+	@VisibleForTesting
+	protected long getCurrentTimeMillis() {
+		return System.currentTimeMillis();
+	}
+
+	/**
+	 * Called periodically to emit a watermark. Checks all shards for the current event time
+	 * watermark, and possibly emits the next watermark.
+	 *
+	 * <p>Shards that have not received an update for a certain interval are considered inactive so as
+	 * to not hold back the watermark indefinitely. When all shards are inactive, the subtask will be
+	 * marked as temporarily idle to not block downstream operators.
+	 */
+	@VisibleForTesting
+	protected void emitWatermark() {
+		LOG.debug(
+			"###evaluating watermark for subtask {} time {}",
+			indexOfThisConsumerSubtask,
+			getCurrentTimeMillis());
+		long potentialWatermark = Long.MAX_VALUE;
+		long idleTime =
+			(shardIdleIntervalMillis > 0)
+				? getCurrentTimeMillis() - shardIdleIntervalMillis
+				: Long.MAX_VALUE;
+
+		for (Map.Entry<Integer, ShardWatermarkState> e : shardWatermarks.entrySet()) {
+			// consider only active shards, or those that would advance the watermark
+			Watermark w = e.getValue().periodicWatermarkAssigner.getCurrentWatermark();
+			if (w != null && (e.getValue().lastUpdated >= idleTime || w.getTimestamp() > lastWatermark)) {
+				potentialWatermark = Math.min(potentialWatermark, w.getTimestamp());
+			}
+		}
+
+		// advance watermark if possible (watermarks can only be ascending)
+		if (potentialWatermark == Long.MAX_VALUE) {
 
 Review comment:
   The potential watermark depends on the logic in the prior loop. The idle condition should only be executed when there is no potential watermark.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services