You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Kieran Sherlock (JIRA)" <ji...@apache.org> on 2016/02/04 23:22:40 UTC
[jira] [Created] (TINKERPOP-1127) client fails to reconnect to
restarted server
Kieran Sherlock created TINKERPOP-1127:
------------------------------------------
Summary: client fails to reconnect to restarted server
Key: TINKERPOP-1127
URL: https://issues.apache.org/jira/browse/TINKERPOP-1127
Project: TinkerPop
Issue Type: Bug
Components: driver
Affects Versions: 3.1.0-incubating
Reporter: Kieran Sherlock
If a gremlin-server is restarted, the client will never reconnect to it.
Start server1
Start server2
Start client such as
{code}
GryoMapper kryo = GryoMapper.build().addRegistry(TitanIoRegistry.INSTANCE).create();
MessageSerializer serializer = new GryoMessageSerializerV1d0(kryo);
Cluster titanCluster = Cluster.build()
.addContactPoints("54.X.X.X,54.Y.Y.Y".split(","))
.port(8182)
.minConnectionPoolSize(5)
.maxConnectionPoolSize(10)
.reconnectIntialDelay(1000)
.reconnectInterval(30000)
.serializer(serializer)
.create();
Client client = titanCluster.connect();
client.init();
System.out.println("initialized");
for (int i = 0; i < 200; i++) {
try {
long id = System.currentTimeMillis();
ResultSet results = client.submit("graph.addVertex('a','" + id + "')");
results.one();
results = client.submit("g.V().has('a','" + id + "')");
System.out.println(results.one());
} catch (Exception e) {
e.printStackTrace();
}
try {
TimeUnit.SECONDS.sleep(3);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
System.out.println("done");
client.close();
System.exit(0);
}
{code}
After client has performed a couple of query cycles
Restart server1
Wait 60 seconds so the reconnect should occur
stop server2
Notice that there are no more successful queries, the client has never reconnected to server1
start server2
Notice that still there are no more successful queries
The method ConnectionPool.addConnectionIfUnderMaximum is always returning false because opened >= maxPoolSize. In this particular case opened = 10. I believe that open is trying to track the size of the List of connections but is getting out of sync. The following diff addresses this problem for this particular case
{code:diff}
diff --git a/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java b/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java
index 96c151c..81ce81d 100644
--- a/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java
+++ b/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java
@@ -326,6 +326,7 @@ final class ConnectionPool {
private void definitelyDestroyConnection(final Connection connection) {
bin.add(connection);
connections.remove(connection);
+ open.decrementAndGet();
if (connection.borrowed.get() == 0 && bin.remove(connection))
connection.closeAsync();
@@ -388,6 +389,8 @@ final class ConnectionPool {
// if the host is unavailable then we should release the connections
connections.forEach(this::definitelyDestroyConnection);
+ // there are no connections open
+ open.set(0);
// let the load-balancer know that the host is acting poorly
this.cluster.loadBalancingStrategy().onUnavailable(host);
@@ -413,6 +416,7 @@ final class ConnectionPool {
this.cluster.loadBalancingStrategy().onAvailable(host);
return true;
} catch (Exception ex) {
+ logger.debug("Failed reconnect attempt on {}", host);
if (connection != null) definitelyDestroyConnection(connection);
return false;
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)