You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Jayanta Mondal (Jira)" <ji...@apache.org> on 2019/09/03 21:02:00 UTC
[jira] [Commented] (TINKERPOP-2288) Get ConnectionPoolBusyException and then ServerUnavailableExceptions

    [ https://issues.apache.org/jira/browse/TINKERPOP-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921721#comment-16921721 ] 

Jayanta Mondal commented on TINKERPOP-2288:
-------------------------------------------

The conditions under which ServerUnavailableException() is thrown can cause a lot of confusion for the client applications. Due to the naming, it seems like that the server is not accepting connections, but in reality, the Grmelin.NET client is not even contacting the Server to get a new connection. In the paragraphs below, I try to explain, why this may be the case.

 

ServerUnavailableException() is thrown when there are no active connections in the connection pool (Condition A), which can happen when a connection is requested from the connection pool which is holding onto a pool of dead connections. A connection in a pool can be dead if it has been used for a while and the server has terminated the connection in order to recycle idle connections.  In other words, if the client creates a connection pool, and then doesn't use the connections for a long time, and then requests a new connection to send requests, it can run into ServerUnavailableException(). During the time a new connection is requested, the dead connections in the pool are identified as closed (see A), and removed from the pool and hits condition B. It can also hit condition C (which eventually leads to B), and throw a ConnectionPoolBusyException(), if the dead connections are being removed from the pool, but not all of the have been removed.

 

The source points to [https://github.com/apache/tinkerpop/releases/tag/3.4.3]

 

[https://github.com/apache/tinkerpop/blob/f203acaa4b3abcf4ee094aa70afb8496c732030e/gremlin-dotnet/src/Gremlin.Net/Driver/ConnectionPool.cs]

 

A (Line 127)

B (Line 114)

C (Line 134)

 

The call to EnsurePoolIsPopulatedAsync() is meant to protect against this behavior above, but, it can get bypassed which looks like

a race condition/ incorrect behavior.

 
 # The connection pool is populated via PopulatePoolAsync() and line 51.
 # but EnsurePoolIsPopulatedAsync()  might exit without repopulating due to Line 65 (when the pool is holding onto a set of dead connections).

 

*What Users can do:*   Users can observe multiple of ConnectionPoolBusyException() and ServerUnavailableException() exceptions, since there could be more than one new connections requested in parallel. Users need to catch these and move forward to request new connections and wait for the connection pool to be ultimately populated.

 

*What Gremlin .NET can do:* Perhaps Gremlin .NET needs to find a way to contact the server before throwing a ServerUnavailableException() exception. I will leave the implementation to the TinkerPop team, but here is a suggestion that I have in mind:
 # *Change* (line 65)  if (_poolSize <= NrConnections) return;  *to*   if (_poolSize <= NrActiveConnections) return;
 # This can be done by first iterating over all the connections and removing the closed/dead ones.
 # I also believe that this is the right contract for  EnsurePoolIsPopulatedAsync()

> Get ConnectionPoolBusyException and then ServerUnavailableExceptions
> --------------------------------------------------------------------
>
>                 Key: TINKERPOP-2288
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2288
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: dotnet
>    Affects Versions: 3.4.3
>         Environment: Gremlin.Net 3.4.3
> Microsoft.NetCore.App 2.2
> Azure Cosmos DB
>            Reporter: patrice huot
>            Priority: Critical
>
> I am using .Net core Gremlin API  query Cosmos DB.
> From time to time we are getting an error saying that no connection is available and then the server become unavailable. When this is occurring we need to restart the server. It looks like the connections are not released properly and become unavailable forever.
> We have configured the pool size to 50 and the MaxInProcessPerConnection to 32 (Which I guess should be sufficient).
> To diagnose the issue, Is there a way to access diagnostic information on the connection pool in order to know how many connections are open and how many processes are running in each connection?
> I would like to be able to monitor the connections usage to see if they are about to be exhausted and to see if the number of used connections is always increasing or of the connection lease is release when the queries completes?
> As a work around, Is there a way we can access this information from the code so that I can catch those scenario and create logic that re-initiate the connection pool?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)