You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by sp...@apache.org on 2021/05/19 18:40:00 UTC

[tinkerpop] branch TINKERPOP-2569 created (now 9f3e074)

This is an automated email from the ASF dual-hosted git repository.

spmallette pushed a change to branch TINKERPOP-2569
in repository https://gitbox.apache.org/repos/asf/tinkerpop.git.


      at 9f3e074  TINKERPOP-2569 Reconnect if java driver fails to initialize

This branch includes the following new commits:

     new 9f3e074  TINKERPOP-2569 Reconnect if java driver fails to initialize

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


[tinkerpop] 01/01: TINKERPOP-2569 Reconnect if java driver fails to initialize

Posted by sp...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

spmallette pushed a commit to branch TINKERPOP-2569
in repository https://gitbox.apache.org/repos/asf/tinkerpop.git

commit 9f3e074c284441c03e92d6102eae3e7609d0c65a
Author: Stephen Mallette <st...@amazon.com>
AuthorDate: Wed May 19 14:34:50 2021 -0400

    TINKERPOP-2569 Reconnect if java driver fails to initialize
    
    Does result in a slight change in behavior as client initialization is now aware of dead hosts and will throw a NoHostAvailableException earlier (used to wait for a request to be submitted). This is probably better behavior. It seems unlikely that anyone would rely on this behavior for normal usage so pushing it to 3.4.x seems safe.
---
 CHANGELOG.asciidoc                                 |  2 +
 .../tinkerpop/gremlin/driver/ConnectionPool.java   |  7 +++-
 .../gremlin/server/GremlinDriverIntegrateTest.java | 21 ++++++----
 .../gremlin/server/GremlinServerIntegrateTest.java | 49 +++++++++++++++++++++-
 4 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/CHANGELOG.asciidoc b/CHANGELOG.asciidoc
index 46fdeb1..2d24b58 100644
--- a/CHANGELOG.asciidoc
+++ b/CHANGELOG.asciidoc
@@ -23,6 +23,8 @@ image::https://raw.githubusercontent.com/apache/tinkerpop/master/docs/static/ima
 [[release-3-4-12]]
 === TinkerPop 3.4.12 (Release Date: NOT OFFICIALLY RELEASED YET)
 
+* Fixed bug in Java driver where it would not reconnect to a host that it did not successfully connect to at initialization.
+* Changed behavior for Java driver `Client` such that a failed initialization will notify other existing `Client` instances of a dead host and therefore throw `NoHostAvailableException`.
 
 [[release-3-4-11]]
 === TinkerPop 3.4.11 (Release Date: May 3, 2021)
diff --git a/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java b/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java
index 82a7ae8..5e7252b 100644
--- a/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java
+++ b/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java
@@ -111,8 +111,11 @@ final class ConnectionPool {
             logger.warn("Initialization of connections cancelled for {}", getPoolInfo(), ce);
             throw ce;
         } catch (CompletionException ce) {
-            // Some connections might have been initialized. Close the connection pool gracefully to close them.
-            this.closeAsync();
+            // since there was some error here, the host is likely unavailable at this point. if we dont mark it as
+            // such the client won't try to reconnect to the dead host. need to initialize this.open because if we
+            // don't then reconnects will fail when trying to create a new one with addConnectionIfUnderMaximum()
+            this.open = new AtomicInteger(0);
+            considerHostUnavailable();
 
             final String errMsg = "Could not initialize " + minPoolSize + " (minPoolSize) connections in pool." +
                     " Successful connections=" + this.connections.size() +
diff --git a/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinDriverIntegrateTest.java b/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinDriverIntegrateTest.java
index 0e03486..a7dd556 100644
--- a/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinDriverIntegrateTest.java
+++ b/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinDriverIntegrateTest.java
@@ -892,17 +892,17 @@ public class GremlinDriverIntegrateTest extends AbstractGremlinServerIntegration
             stopServer();
 
             // We create a new client here which will fail to initialize but the original client still has
-            // host marked as connected. Since the second client failed during initialization, it has no way to
-            // test if a host is indeed unreachable because it doesn't have any established connections. It will not add
-            // the host to load balancer but it will also not remove it if it already exists there. Leave that
-            // responsibility to a client that added it. In this case, let the second client perform it's own mechanism
-            // to mark host as unavailable. The first client will discover that the host has failed either with next
-            // keepAlive message or the next request, whichever is earlier. In this case, we will simulate the second
-            // scenario by sending a new request on first client. The request would fail (since server is down) and
-            // client should mark the host unavailable.
-            cluster.connect().init();
+            // host marked as connected. The second client will mark the Host as unavailable when it tries to
+            // initialize the ConnectionPool.
+            try {
+                cluster.connect().init();
+                fail("Should have thrown NoHostAvailableException");
+            } catch (NoHostAvailableException ignored) {
+                // ignored
+            }
 
             try {
+                // client1 should now be dead
                 client1.submit("1+1").all().join();
                 fail("Expecting an exception because the server is shut down.");
             } catch (Exception ex) {
@@ -910,6 +910,9 @@ public class GremlinDriverIntegrateTest extends AbstractGremlinServerIntegration
             }
 
             assertEquals(0, cluster.availableHosts().size());
+
+            // reconnect gets tested in a similar fashion in GremlinServerIntegrateTest#shouldFailOnInitiallyDeadHost
+
         } finally {
             cluster.close();
         }
diff --git a/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinServerIntegrateTest.java b/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinServerIntegrateTest.java
index 8a63694..9038d90 100644
--- a/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinServerIntegrateTest.java
+++ b/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinServerIntegrateTest.java
@@ -29,6 +29,7 @@ import org.apache.tinkerpop.gremlin.driver.Cluster;
 import org.apache.tinkerpop.gremlin.driver.Result;
 import org.apache.tinkerpop.gremlin.driver.ResultSet;
 import org.apache.tinkerpop.gremlin.driver.Tokens;
+import org.apache.tinkerpop.gremlin.driver.exception.NoHostAvailableException;
 import org.apache.tinkerpop.gremlin.driver.exception.ResponseException;
 import org.apache.tinkerpop.gremlin.driver.message.RequestMessage;
 import org.apache.tinkerpop.gremlin.driver.message.ResponseMessage;
@@ -900,7 +901,7 @@ public class GremlinServerIntegrateTest extends AbstractGremlinServerIntegration
 
         try {
             // try to re-issue a request now that the server is down
-            client.submit("g").all().join();
+            client.submit("g").all().get(3000, TimeUnit.MILLISECONDS);
             fail("Should throw an exception.");
         } catch (RuntimeException re) {
             // Client would have no active connections to the host, hence it would encounter a timeout
@@ -923,7 +924,7 @@ public class GremlinServerIntegrateTest extends AbstractGremlinServerIntegration
 
                 try {
                     // just need to succeed at reconnect one time
-                    final List<Result> results = client.submit("1+1").all().join();
+                    final List<Result> results = client.submit("1+1").all().get(3000, TimeUnit.MILLISECONDS);
                     assertEquals(1, results.size());
                     assertEquals(2, results.get(0).getInt());
                     break;
@@ -938,6 +939,50 @@ public class GremlinServerIntegrateTest extends AbstractGremlinServerIntegration
     }
 
     @Test
+    public void shouldFailOnInitiallyDeadHost() throws Exception {
+
+        // start test with no server
+        this.stopServer();
+
+        final Cluster cluster = TestClientFactory.build().create();
+        final Client client = cluster.connect();
+
+        try {
+            // try to re-issue a request now that the server is down
+            client.submit("g").all().get(3000, TimeUnit.MILLISECONDS);
+            fail("Should throw an exception.");
+        } catch (RuntimeException re) {
+            // Client would have no active connections to the host, hence it would encounter a timeout
+            // trying to find an alive connection to the host.
+            assertThat(re.getCause(), instanceOf(NoHostAvailableException.class));
+
+            //
+            // should recover when the server comes back
+            //
+
+            // restart server
+            this.startServer();
+
+            // try a bunch of times to reconnect. on slower systems this may simply take longer...looking at you travis
+            for (int ix = 1; ix < 11; ix++) {
+                // the retry interval is 1 second, wait a bit longer
+                TimeUnit.SECONDS.sleep(5);
+
+                try {
+                    final List<Result> results = client.submit("1+1").all().get(3000, TimeUnit.MILLISECONDS);
+                    assertEquals(1, results.size());
+                    assertEquals(2, results.get(0).getInt());
+                } catch (Exception ex) {
+                    if (ix == 10)
+                        fail("Should have eventually succeeded");
+                }
+            }
+        } finally {
+            cluster.close();
+        }
+    }
+
+    @Test
     public void shouldNotHavePartialContentWithOneResult() throws Exception {
         try (SimpleClient client = TestClientFactory.createWebSocketClient()) {
             final RequestMessage request = RequestMessage.build(Tokens.OPS_EVAL)