You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/01/03 03:39:20 UTC

[GitHub] [flink] TisonKun opened a new pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

TisonKun opened a new pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754
 
 
   ## What is the purpose of the change
   
   `ZKCheckpointIDCounter` doesn't tolerate ZK suspended & reconnected while it could do. This causes that job can not trigger checkpoint forever after zookeeper change leader.
   
   ## Brief change log
   
   Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK.
   
   ## Verifying this change
   
   This change is a trivial fix that can be reasoned by code.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#discussion_r366360538
 
 

 ##########
 File path: flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/ZKCheckpointIDCounterMultiServersTest.java
 ##########
 @@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.checkpoint;
+
+import org.apache.flink.runtime.zookeeper.ZooKeeperTestEnvironment;
+import org.apache.flink.util.TestLogger;
+
+import org.apache.curator.framework.CuratorFramework;
+import org.apache.curator.framework.state.ConnectionState;
+import org.apache.curator.test.TestingCluster;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.concurrent.atomic.AtomicLong;
+
+import static org.hamcrest.CoreMatchers.is;
+import static org.hamcrest.CoreMatchers.notNullValue;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link ZooKeeperCheckpointIDCounter} in a ZooKeeper ensemble.
+ */
+public final class ZKCheckpointIDCounterMultiServersTest extends TestLogger {
+
+	private static final ZooKeeperTestEnvironment ZOOKEEPER = new ZooKeeperTestEnvironment(3);
+
+	@AfterClass
+	public static void tearDown() throws Exception {
+		ZOOKEEPER.shutdown();
+	}
+
+	@Before
+	public void cleanUp() throws Exception {
+		ZOOKEEPER.deleteAll();
+	}
 
 Review comment:
   Instead of the `ZooKeeperTestEnvironment` I would recommend using the `ZooKeeperResource`. Combining this with the `@Rule` will replace the `AfterClass` and `Before` methods. Maybe one needs to make the `TestingServer` accessible, though.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#discussion_r366394875
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCheckpointIDCounter.java
 ##########
 @@ -77,14 +84,26 @@ public ZooKeeperCheckpointIDCounter(CuratorFramework client, String counterPath)
 		this.client = checkNotNull(client, "Curator client");
 		this.counterPath = checkNotNull(counterPath, "Counter path");
 		this.sharedCount = new SharedCount(client, counterPath, 1);
+
+		this.connectionStateListeners = new ArrayList<>();
+		this.connectionStateListeners.add((ignore, newState) -> lastState = newState);
+	}
+
+	@VisibleForTesting
+	ZooKeeperCheckpointIDCounter(CuratorFramework client, String counterPath, Collection<ConnectionStateListener> listeners) {
+		this(client, counterPath);
+		this.connectionStateListeners.addAll(listeners);
 	}
 
 	@Override
 	public void start() throws Exception {
 		synchronized (startStopLock) {
 			if (!isStarted) {
 				sharedCount.start();
-				client.getConnectionStateListenable().addListener(connStateListener);
+
+				for (ConnectionStateListener listener : connectionStateListeners) {
+					client.getConnectionStateListenable().addListener(listener);
 
 Review comment:
   Since `getConnectionStateListenable` does not guarantee the order in which the listener are called, the added test case is unstable (if the testing listener is called before the one which sets `lastState`). I suggest to introduce a `LastStateConnectionStateListener` (similar to `SharedCountConnectionStateListener`) which we pass into the class.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#discussion_r366362147
 
 

 ##########
 File path: flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/ZKCheckpointIDCounterMultiServersTest.java
 ##########
 @@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.checkpoint;
+
+import org.apache.flink.runtime.zookeeper.ZooKeeperTestEnvironment;
+import org.apache.flink.util.TestLogger;
+
+import org.apache.curator.framework.CuratorFramework;
+import org.apache.curator.framework.state.ConnectionState;
+import org.apache.curator.test.TestingCluster;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.concurrent.atomic.AtomicLong;
+
+import static org.hamcrest.CoreMatchers.is;
+import static org.hamcrest.CoreMatchers.notNullValue;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link ZooKeeperCheckpointIDCounter} in a ZooKeeper ensemble.
+ */
+public final class ZKCheckpointIDCounterMultiServersTest extends TestLogger {
+
+	private static final ZooKeeperTestEnvironment ZOOKEEPER = new ZooKeeperTestEnvironment(3);
+
+	@AfterClass
+	public static void tearDown() throws Exception {
+		ZOOKEEPER.shutdown();
+	}
+
+	@Before
+	public void cleanUp() throws Exception {
+		ZOOKEEPER.deleteAll();
+	}
+
+	/**
+	 * Tests that {@link ZooKeeperCheckpointIDCounter} can be recovered after a
+	 * connection loss exception from ZooKeeper ensemble.
+	 *
+	 * See also FLINK-14091.
+	 */
+	@Test
+	public void testRecoveredAfterConnectionLoss() throws Exception {
+		CuratorFramework client = ZOOKEEPER.getClient();
+
+		ZooKeeperCheckpointIDCounter idCounter = new ZooKeeperCheckpointIDCounter(client, "/checkpoint-id-counter");
+		idCounter.start();
+
+		AtomicLong localCounter = new AtomicLong(1L);
+
+		assertThat(
+			"ZooKeeperCheckpointIDCounter doesn't properly work.",
+			idCounter.getAndIncrement(),
+			is(localCounter.getAndIncrement()));
+
+		TestingCluster cluster = ZOOKEEPER.getZooKeeperCluster();
+		assertThat(cluster, is(notNullValue()));
+
+		// close the server this client connected to, which triggers a connection loss exception
+		cluster.restartServer(cluster.findConnectionInstance(client.getZookeeperClient().getZooKeeper()));
+
+		// encountered connected loss, this prevents us from getting false positive
+		while (true) {
+			try {
+				idCounter.get();
+			} catch (IllegalStateException ignore) {
+				log.debug("Encountered connection loss.");
+				break;
+			}
+		}
+
+		// recovered from connection loss
+		while (true) {
+			try {
+				long id = idCounter.get();
+				assertThat(id, is(localCounter.get()));
+				break;
+			} catch (IllegalStateException ignore) {
+				log.debug("During ZooKeeper client reconnecting...");
+			}
+		}
+
+		assertThat(idCounter.getLastState(), is(ConnectionState.RECONNECTED));
 
 Review comment:
   I think it is not important for the test to ensure that some internal state is `RECONNECTED`. What we should try to test is that we can increment the ID counter under loss of connection but it is not important how exactly this works.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] TisonKun commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
TisonKun commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-574553764
 
 
   Thanks for reviewing and merging this patch!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570453916
 
 
   <!--
   Meta data
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/142955301 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144292982 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144299105 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   -->
   ## CI report:
   
   * 760a02385546fc92d8727c0643fdbbbb3dc3c1ca Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142955301) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058) 
   * 7bfd2b1923422ea452f8e633a146273f4bb93469 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144292982) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321) 
   * 4d0d330bc1abf80660f5e77e5c3c891f863c226f Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144299105) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570453916
 
 
   <!--
   Meta data
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   -->
   ## CI report:
   
   * 760a02385546fc92d8727c0643fdbbbb3dc3c1ca UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570453916
 
 
   <!--
   Meta data
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/142955301 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   -->
   ## CI report:
   
   * 760a02385546fc92d8727c0643fdbbbb3dc3c1ca Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/142955301) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#discussion_r366360933
 
 

 ##########
 File path: flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/ZKCheckpointIDCounterMultiServersTest.java
 ##########
 @@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.checkpoint;
+
+import org.apache.flink.runtime.zookeeper.ZooKeeperTestEnvironment;
+import org.apache.flink.util.TestLogger;
+
+import org.apache.curator.framework.CuratorFramework;
+import org.apache.curator.framework.state.ConnectionState;
+import org.apache.curator.test.TestingCluster;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.concurrent.atomic.AtomicLong;
+
+import static org.hamcrest.CoreMatchers.is;
+import static org.hamcrest.CoreMatchers.notNullValue;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link ZooKeeperCheckpointIDCounter} in a ZooKeeper ensemble.
+ */
+public final class ZKCheckpointIDCounterMultiServersTest extends TestLogger {
+
+	private static final ZooKeeperTestEnvironment ZOOKEEPER = new ZooKeeperTestEnvironment(3);
+
+	@AfterClass
+	public static void tearDown() throws Exception {
+		ZOOKEEPER.shutdown();
+	}
+
+	@Before
+	public void cleanUp() throws Exception {
+		ZOOKEEPER.deleteAll();
+	}
+
+	/**
+	 * Tests that {@link ZooKeeperCheckpointIDCounter} can be recovered after a
+	 * connection loss exception from ZooKeeper ensemble.
+	 *
+	 * See also FLINK-14091.
+	 */
+	@Test
+	public void testRecoveredAfterConnectionLoss() throws Exception {
+		CuratorFramework client = ZOOKEEPER.getClient();
+
+		ZooKeeperCheckpointIDCounter idCounter = new ZooKeeperCheckpointIDCounter(client, "/checkpoint-id-counter");
+		idCounter.start();
+
+		AtomicLong localCounter = new AtomicLong(1L);
+
+		assertThat(
+			"ZooKeeperCheckpointIDCounter doesn't properly work.",
+			idCounter.getAndIncrement(),
+			is(localCounter.getAndIncrement()));
+
+		TestingCluster cluster = ZOOKEEPER.getZooKeeperCluster();
+		assertThat(cluster, is(notNullValue()));
+
+		// close the server this client connected to, which triggers a connection loss exception
+		cluster.restartServer(cluster.findConnectionInstance(client.getZookeeperClient().getZooKeeper()));
+
+		// encountered connected loss, this prevents us from getting false positive
+		while (true) {
+			try {
+				idCounter.get();
+			} catch (IllegalStateException ignore) {
+				log.debug("Encountered connection loss.");
+				break;
+			}
+		}
 
 Review comment:
   Are you sure that this always happens? This looks quite brittle to me. What if the restart is so fast that the client does not lose its connection?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570453916
 
 
   <!--
   Meta data
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/142955301 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144292982 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144299105 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   -->
   ## CI report:
   
   * 760a02385546fc92d8727c0643fdbbbb3dc3c1ca Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142955301) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058) 
   * 7bfd2b1923422ea452f8e633a146273f4bb93469 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144292982) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321) 
   * 4d0d330bc1abf80660f5e77e5c3c891f863c226f Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144299105) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570453916
 
 
   <!--
   Meta data
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/142955301 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144292982 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144299105 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   Hash:357601edf9fcc6440a982eb91653cc8db64b48e9 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/144379400 TriggerType:PUSH TriggerID:357601edf9fcc6440a982eb91653cc8db64b48e9
   Hash:357601edf9fcc6440a982eb91653cc8db64b48e9 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4338 TriggerType:PUSH TriggerID:357601edf9fcc6440a982eb91653cc8db64b48e9
   -->
   ## CI report:
   
   * 760a02385546fc92d8727c0643fdbbbb3dc3c1ca Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142955301) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058) 
   * 7bfd2b1923422ea452f8e633a146273f4bb93469 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144292982) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321) 
   * 4d0d330bc1abf80660f5e77e5c3c891f863c226f Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144299105) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323) 
   * 357601edf9fcc6440a982eb91653cc8db64b48e9 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144379400) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4338) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann closed pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
tillrohrmann closed pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#discussion_r366361654
 
 

 ##########
 File path: flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/ZKCheckpointIDCounterMultiServersTest.java
 ##########
 @@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.checkpoint;
+
+import org.apache.flink.runtime.zookeeper.ZooKeeperTestEnvironment;
+import org.apache.flink.util.TestLogger;
+
+import org.apache.curator.framework.CuratorFramework;
+import org.apache.curator.framework.state.ConnectionState;
+import org.apache.curator.test.TestingCluster;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.concurrent.atomic.AtomicLong;
+
+import static org.hamcrest.CoreMatchers.is;
+import static org.hamcrest.CoreMatchers.notNullValue;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link ZooKeeperCheckpointIDCounter} in a ZooKeeper ensemble.
+ */
+public final class ZKCheckpointIDCounterMultiServersTest extends TestLogger {
+
+	private static final ZooKeeperTestEnvironment ZOOKEEPER = new ZooKeeperTestEnvironment(3);
+
+	@AfterClass
+	public static void tearDown() throws Exception {
+		ZOOKEEPER.shutdown();
+	}
+
+	@Before
+	public void cleanUp() throws Exception {
+		ZOOKEEPER.deleteAll();
+	}
+
+	/**
+	 * Tests that {@link ZooKeeperCheckpointIDCounter} can be recovered after a
+	 * connection loss exception from ZooKeeper ensemble.
+	 *
+	 * See also FLINK-14091.
+	 */
+	@Test
+	public void testRecoveredAfterConnectionLoss() throws Exception {
+		CuratorFramework client = ZOOKEEPER.getClient();
+
+		ZooKeeperCheckpointIDCounter idCounter = new ZooKeeperCheckpointIDCounter(client, "/checkpoint-id-counter");
+		idCounter.start();
+
+		AtomicLong localCounter = new AtomicLong(1L);
+
+		assertThat(
+			"ZooKeeperCheckpointIDCounter doesn't properly work.",
+			idCounter.getAndIncrement(),
+			is(localCounter.getAndIncrement()));
+
+		TestingCluster cluster = ZOOKEEPER.getZooKeeperCluster();
+		assertThat(cluster, is(notNullValue()));
+
+		// close the server this client connected to, which triggers a connection loss exception
+		cluster.restartServer(cluster.findConnectionInstance(client.getZookeeperClient().getZooKeeper()));
+
+		// encountered connected loss, this prevents us from getting false positive
+		while (true) {
+			try {
+				idCounter.get();
+			} catch (IllegalStateException ignore) {
+				log.debug("Encountered connection loss.");
+				break;
+			}
+		}
+
+		// recovered from connection loss
+		while (true) {
+			try {
+				long id = idCounter.get();
+				assertThat(id, is(localCounter.get()));
+				break;
+			} catch (IllegalStateException ignore) {
+				log.debug("During ZooKeeper client reconnecting...");
+			}
+		}
 
 Review comment:
   Adding a timeout/deadline here might make sense.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570453916
 
 
   <!--
   Meta data
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/142955301 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144292982 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144299105 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   Hash:357601edf9fcc6440a982eb91653cc8db64b48e9 Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:357601edf9fcc6440a982eb91653cc8db64b48e9
   -->
   ## CI report:
   
   * 760a02385546fc92d8727c0643fdbbbb3dc3c1ca Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142955301) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058) 
   * 7bfd2b1923422ea452f8e633a146273f4bb93469 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144292982) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321) 
   * 4d0d330bc1abf80660f5e77e5c3c891f863c226f Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144299105) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323) 
   * 357601edf9fcc6440a982eb91653cc8db64b48e9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570453916
 
 
   <!--
   Meta data
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/142955301 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144292982 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144299105 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   Hash:357601edf9fcc6440a982eb91653cc8db64b48e9 Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/144379400 TriggerType:PUSH TriggerID:357601edf9fcc6440a982eb91653cc8db64b48e9
   Hash:357601edf9fcc6440a982eb91653cc8db64b48e9 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4338 TriggerType:PUSH TriggerID:357601edf9fcc6440a982eb91653cc8db64b48e9
   -->
   ## CI report:
   
   * 760a02385546fc92d8727c0643fdbbbb3dc3c1ca Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142955301) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058) 
   * 7bfd2b1923422ea452f8e633a146273f4bb93469 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144292982) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321) 
   * 4d0d330bc1abf80660f5e77e5c3c891f863c226f Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144299105) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323) 
   * 357601edf9fcc6440a982eb91653cc8db64b48e9 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144379400) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4338) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] Wangtao87 commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
Wangtao87 commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-589076093
 
 
   > Thanks for reviewing and merging this patch!
   
   SO, does it need be fixed in FLINK 1.7 ??

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570450954
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 760a02385546fc92d8727c0643fdbbbb3dc3c1ca (Fri Jan 03 03:41:39 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#discussion_r366364492
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCheckpointIDCounter.java
 ##########
 @@ -150,32 +171,15 @@ public void setCount(long newId) throws Exception {
 		sharedCount.setCount((int) newId);
 	}
 
-	@VisibleForTesting
-	ConnectionState getLastState() {
-		return connStateListener.lastState;
-	}
-
-	/**
-	 * Connection state listener. In case of {@link ConnectionState#SUSPENDED} or {@link
-	 * ConnectionState#LOST} we are not guaranteed to read a current count from ZooKeeper.
-	 */
-	private static class SharedCountConnectionStateListener implements ConnectionStateListener {
-
-		private volatile ConnectionState lastState;
+	private void checkConnectionState() {
+		final ConnectionState lastState = this.lastState;
 
 Review comment:
   I would not shadow the local variable. One could rename the variable `currentLastState`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570453916
 
 
   <!--
   Meta data
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/142955301 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144292982 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/144299105 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   -->
   ## CI report:
   
   * 760a02385546fc92d8727c0643fdbbbb3dc3c1ca Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142955301) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058) 
   * 7bfd2b1923422ea452f8e633a146273f4bb93469 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144292982) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321) 
   * 4d0d330bc1abf80660f5e77e5c3c891f863c226f Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/144299105) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] TisonKun commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
TisonKun commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-574074203
 
 
   Thanks for your review @tillrohrmann ! I push a follow-up commit for adding dedicate test for it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#discussion_r366747111
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCheckpointIDCounter.java
 ##########
 @@ -77,14 +84,26 @@ public ZooKeeperCheckpointIDCounter(CuratorFramework client, String counterPath)
 		this.client = checkNotNull(client, "Curator client");
 		this.counterPath = checkNotNull(counterPath, "Counter path");
 		this.sharedCount = new SharedCount(client, counterPath, 1);
+
+		this.connectionStateListeners = new ArrayList<>();
+		this.connectionStateListeners.add((ignore, newState) -> lastState = newState);
+	}
+
+	@VisibleForTesting
+	ZooKeeperCheckpointIDCounter(CuratorFramework client, String counterPath, Collection<ConnectionStateListener> listeners) {
+		this(client, counterPath);
+		this.connectionStateListeners.addAll(listeners);
 	}
 
 	@Override
 	public void start() throws Exception {
 		synchronized (startStopLock) {
 			if (!isStarted) {
 				sharedCount.start();
-				client.getConnectionStateListenable().addListener(connStateListener);
+
+				for (ConnectionStateListener listener : connectionStateListeners) {
+					client.getConnectionStateListenable().addListener(listener);
 
 Review comment:
   I also thought about this but I think the other approach is easier to understand.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] TisonKun commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
TisonKun commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#discussion_r366636287
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCheckpointIDCounter.java
 ##########
 @@ -77,14 +84,26 @@ public ZooKeeperCheckpointIDCounter(CuratorFramework client, String counterPath)
 		this.client = checkNotNull(client, "Curator client");
 		this.counterPath = checkNotNull(counterPath, "Counter path");
 		this.sharedCount = new SharedCount(client, counterPath, 1);
+
+		this.connectionStateListeners = new ArrayList<>();
+		this.connectionStateListeners.add((ignore, newState) -> lastState = newState);
+	}
+
+	@VisibleForTesting
+	ZooKeeperCheckpointIDCounter(CuratorFramework client, String counterPath, Collection<ConnectionStateListener> listeners) {
+		this(client, counterPath);
+		this.connectionStateListeners.addAll(listeners);
 	}
 
 	@Override
 	public void start() throws Exception {
 		synchronized (startStopLock) {
 			if (!isStarted) {
 				sharedCount.start();
-				client.getConnectionStateListenable().addListener(connStateListener);
+
+				for (ConnectionStateListener listener : connectionStateListeners) {
+					client.getConnectionStateListenable().addListener(listener);
 
 Review comment:
   Another way is that we instead implement a chain of listeners so that the order is deterministic.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-590962421
 
 
   > > Thanks for reviewing and merging this patch!
   > 
   > SO, does it need be fixed in FLINK 1.7 ??
   
   @Wangtao87 the community no longer actively supports Flink 1.7. Hence you would need to backport the fix to this version yourself if needed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570453916
 
 
   <!--
   Meta data
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/142955301 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   -->
   ## CI report:
   
   * 760a02385546fc92d8727c0643fdbbbb3dc3c1ca Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142955301) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#discussion_r366362475
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCheckpointIDCounter.java
 ##########
 @@ -149,6 +150,11 @@ public void setCount(long newId) throws Exception {
 		sharedCount.setCount((int) newId);
 	}
 
+	@VisibleForTesting
+	ConnectionState getLastState() {
+		return connStateListener.lastState;
+	}
 
 Review comment:
   I think this exposes internal details which are not relevant for the test. I would try to write the test without exposing these internals.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] TisonKun commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
TisonKun commented on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570450614
 
 
   also cc @lamber-ken 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570453916
 
 
   <!--
   Meta data
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/142955301 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144292982 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/144299105 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323 TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   Hash:357601edf9fcc6440a982eb91653cc8db64b48e9 Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/144379400 TriggerType:PUSH TriggerID:357601edf9fcc6440a982eb91653cc8db64b48e9
   Hash:357601edf9fcc6440a982eb91653cc8db64b48e9 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4338 TriggerType:PUSH TriggerID:357601edf9fcc6440a982eb91653cc8db64b48e9
   -->
   ## CI report:
   
   * 760a02385546fc92d8727c0643fdbbbb3dc3c1ca Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142955301) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058) 
   * 7bfd2b1923422ea452f8e633a146273f4bb93469 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144292982) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321) 
   * 4d0d330bc1abf80660f5e77e5c3c891f863c226f Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/144299105) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4323) 
   * 357601edf9fcc6440a982eb91653cc8db64b48e9 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/144379400) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4338) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570453916
 
 
   <!--
   Meta data
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/142955301 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   -->
   ## CI report:
   
   * 760a02385546fc92d8727c0643fdbbbb3dc3c1ca Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142955301) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058) 
   * 7bfd2b1923422ea452f8e633a146273f4bb93469 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#issuecomment-570453916
 
 
   <!--
   Meta data
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://travis-ci.com/flink-ci/flink/builds/142955301 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:760a02385546fc92d8727c0643fdbbbb3dc3c1ca Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058 TriggerType:PUSH TriggerID:760a02385546fc92d8727c0643fdbbbb3dc3c1ca
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/144292982 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:7bfd2b1923422ea452f8e633a146273f4bb93469 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321 TriggerType:PUSH TriggerID:7bfd2b1923422ea452f8e633a146273f4bb93469
   Hash:4d0d330bc1abf80660f5e77e5c3c891f863c226f Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:4d0d330bc1abf80660f5e77e5c3c891f863c226f
   -->
   ## CI report:
   
   * 760a02385546fc92d8727c0643fdbbbb3dc3c1ca Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142955301) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4058) 
   * 7bfd2b1923422ea452f8e633a146273f4bb93469 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/144292982) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4321) 
   * 4d0d330bc1abf80660f5e77e5c3c891f863c226f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] TisonKun commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK

Posted by GitBox <gi...@apache.org>.
TisonKun commented on a change in pull request #10754: [FLINK-14091][coordination] Allow updates to connection state when ZKCheckpointIDCounter reconnects to ZK
URL: https://github.com/apache/flink/pull/10754#discussion_r366406377
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCheckpointIDCounter.java
 ##########
 @@ -77,14 +84,26 @@ public ZooKeeperCheckpointIDCounter(CuratorFramework client, String counterPath)
 		this.client = checkNotNull(client, "Curator client");
 		this.counterPath = checkNotNull(counterPath, "Counter path");
 		this.sharedCount = new SharedCount(client, counterPath, 1);
+
+		this.connectionStateListeners = new ArrayList<>();
+		this.connectionStateListeners.add((ignore, newState) -> lastState = newState);
+	}
+
+	@VisibleForTesting
+	ZooKeeperCheckpointIDCounter(CuratorFramework client, String counterPath, Collection<ConnectionStateListener> listeners) {
+		this(client, counterPath);
+		this.connectionStateListeners.addAll(listeners);
 	}
 
 	@Override
 	public void start() throws Exception {
 		synchronized (startStopLock) {
 			if (!isStarted) {
 				sharedCount.start();
-				client.getConnectionStateListenable().addListener(connStateListener);
+
+				for (ConnectionStateListener listener : connectionStateListeners) {
+					client.getConnectionStateListenable().addListener(listener);
 
 Review comment:
   Nice catch!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services