You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Florian Hockmann (Jira)" <ji...@apache.org> on 2020/04/08 15:38:00 UTC

[jira] [Commented] (TINKERPOP-2019) Gremlin.Net.Driver.WebSocketConnection throws System.InvalidOperationException

    [ https://issues.apache.org/jira/browse/TINKERPOP-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078396#comment-17078396 ] 

Florian Hockmann commented on TINKERPOP-2019:
---------------------------------------------

I looked into this some more as I wanted to create an issue in the {{dotnet}} repo. After looking into [the source code of the {{ManagedWebSocket}} class|https://github.com/dotnet/runtime/blob/master/src/libraries/Common/src/System/Net/WebSockets/ManagedWebSocket.cs], I think that this could actually not be a problem of calling {{ReceiveAsync}} two times in parallel, but maybe that we call {{CloseAsync}} while we still have a {{ReceiveAsync}} that hasn't completed yet or vice versa. {{CloseAsync}} also calls internally {{ReceiveAsyncPrivate}} to receive a close response from the server.
 {{ManagedWebSocket}} shouldn't have a problem in general when {{CloseAsync}} is called when there is still an unfinished {{ReceiveAsync}} operation in progress as that [was an issue that has been fixed in 2016 already|https://github.com/dotnet/runtime/issues/17819].
 But I wonder whether we could see some kind of race condition here where the checks that were added for the mentioned issue succeed although the task is not fully completed yet or something like that. The fact that you needed to perform 4,5 M calls under bad network conditions, if I understood that correctly, [~dzmitry.lahoda], until the exception occurred, could also be seen as a hint that we have a rare race condition here.

I see two options to proceed further here. We could either:
 * try whether we can simply fix this by cancelling any pending operations on the {{ClientWebSocket}} in our {{WebSocketConnection}} class with a {{CancellationTokenSource}} before calling {{CloseAsync}} or
 * try to reproduce this problem with a minimalistic example outside of Gremlin.Net and then use that example to create an issue in the {{dotnet}} repo.

I think that we need a minimalistic example that still reproduces this exception to get meaningful help from the {{dotnet}} team as they otherwise cannot be sure that the error is not in our usage of {{ClientWebSocket}} and it would of course also rule out that possibility for us.

Using a {{CancellationTokenSource}} to cancel any operations before calling {{CloseAsync}} is probably a good idea in general, but it would of course be good to know whether that already solves the problem.

I'll try to follow up on this and try out one or both approaches, but I'm not sure yet when I'll have the time for this. So, if anyone wants to take this up, then that would be greatly appreciated.

> Gremlin.Net.Driver.WebSocketConnection throws System.InvalidOperationException
> ------------------------------------------------------------------------------
>
>                 Key: TINKERPOP-2019
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2019
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: dotnet
>    Affects Versions: 3.3.3
>         Environment: Azure App Service
>            Reporter: Sami
>            Priority: Critical
>         Attachments: image-2020-02-21-05-32-58-730.png, image-2020-02-21-05-33-27-246.png, invalid.txt
>
>
> We're getting the following {{System.InvalidOperationException}} error message:
> {code:c#}
> "There is already one outstanding 'SendAsync' call for this WebSocket instance. ReceiveAsync and SendAsync can be called simultaneously, but at most one outstanding operation for each of them is allowed at the same time.
> Problem Id:
> System.InvalidOperationException at Gremlin.Net.Driver.WebSocketConnection+<SendMessageAsync>d__5.MoveNext"{code}
>  
>  We get this exception sporadically and only a few times out of thousands. Unfortunately we have not been able to reproduce it.
>   
>  I understand that when dealing with web sockets, it is allowed to have only a single pending "send" or a single pending "receive".
>   
>  After looking at GitHub's WebSocketConnection class, I don't see any orchestration between SendMessageAsync's {{_client.SendAsync}} (currently line 54) and ReceiveMessageAsync's {{_client.ReceiveAsync}} (currently line 66). 
>   
>  Reference Link: 
>  [https://github.com/apache/tinkerpop/blob/master/gremlin-dotnet/src/Gremlin.Net/Driver/WebSocketConnection.cs]
>   
>  I'm wondering if not having orchestration in the WebSocketConnection class to keep the single pending "send" or a single pending "receive" rule may be the cause. 
>   
>  In our .NET Core web api application, we create the GremlinConnection as a singleton in Startup.cs and then have one central call that makes Gremlin calls; i.e. it's a very straightforward implementation.
>   
>  Startup.cs:
> {code:c#}
> public void ConfigureServices(IServiceCollection services)
> {
>     //...other stuff removed for brevity
>     services.AddSingleton<IGremlinConnection, GremlinConnection>();
> }{code}
>  
>  Reader.cs:
> {code:c#}
> public async Task<IReadOnlyCollection<dynamic>> ExecuteGremlinQuery(string query)
> {
>     try
>     {
>         return await _gremlinConnection.Client.SubmitAsync<dynamic>(query);
>     }
>     catch (Gremlin.Net.Driver.Exceptions.ResponseException responseException)
>     {
>         //our error handling removed for brevity!    
>     }
> }{code}
>   
>  We use the Gremlin.Net version 3.3.3 nuget package and the Microsoft.NETCore.App SDK
>   
>  Would it be possible to identify if this is indeed a bug on Gremlin.NET? 
>  And if it is, any thoughts on a best-practice (temporary) work-around that we can implement?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)