You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Roi Martin <jr...@gmail.com> on 2022/05/25 13:53:18 UTC

Deadlock in Gremlin Go on read error

Hi,

First of all, sorry if this is not the right channel to report a bug. I saw
that the Developer Documentation mentions JIRA. However, I was not sure if
non-developers can create new issues.


Description of the issue:

Gremlin Go hangs when the connection with the database is dropped in the middle
of the execution of a long traversal. Just so you have some specific context,
the database I'm using is AWS Neptune and I access it through local
port-forwarding. I identified that Gremlin Go was hanging when the
port-forwarding died or became unstable in the middle of the execution of a
traversal.

How to reproduce the issue:

1. Execute an expensive traversal.
2. Drop the connection to the DB in the middle of the execution.
3. Gremlin Go should hang.

Root cause analysis:

After debugging the issue, it seems to be caused by a deadlock that happens in
gremlinServerWSProtocol.readLoop() when protocol.transporter.Read() returns an
error.

The following simplified call graph shows what is happening:

gremlinServerWSProtocol.readLoop()
	gorillaTransporter.Read()
	readErrorHandler()
		synchronizedMap.synchronizedRange()
			synchronizedMap.syncLock.Lock()
			channelResultSet.Close()
				channelResultSet.container.delete()
					synchronizedMap.syncLock.Lock()

As you can see synchronizedMap.syncLock.Lock() is called twice without
unlocking the mutex, which causes the deadlock.


I'm not familiar with the code base so bear with me if the information I'm
providing is not totally accurate.

Thanks for the hard work! So far, the driver looks amazing!

	Roi Martin

Re: Deadlock in Gremlin Go on read error

Posted by Lyndon Bauto <ly...@bitquilltech.com.INVALID>.
Hey Roi, thank you for the detailed report, its super helpful!

I just went through what you described in the code and I think you hit the
nail on the head for the root cause of the issue. It's awesome that you
found this before we made our release candidate!

Myself or another one of the devs that worked on gremlin-go will start
investigating this immediately and will reply to you here when it is
resolved.

On Wed, May 25, 2022 at 6:53 AM Roi Martin <jr...@gmail.com> wrote:

> Hi,
>
> First of all, sorry if this is not the right channel to report a bug. I saw
> that the Developer Documentation mentions JIRA. However, I was not sure if
> non-developers can create new issues.
>
>
> Description of the issue:
>
> Gremlin Go hangs when the connection with the database is dropped in the
> middle
> of the execution of a long traversal. Just so you have some specific
> context,
> the database I'm using is AWS Neptune and I access it through local
> port-forwarding. I identified that Gremlin Go was hanging when the
> port-forwarding died or became unstable in the middle of the execution of a
> traversal.
>
> How to reproduce the issue:
>
> 1. Execute an expensive traversal.
> 2. Drop the connection to the DB in the middle of the execution.
> 3. Gremlin Go should hang.
>
> Root cause analysis:
>
> After debugging the issue, it seems to be caused by a deadlock that
> happens in
> gremlinServerWSProtocol.readLoop() when protocol.transporter.Read()
> returns an
> error.
>
> The following simplified call graph shows what is happening:
>
> gremlinServerWSProtocol.readLoop()
>         gorillaTransporter.Read()
>         readErrorHandler()
>                 synchronizedMap.synchronizedRange()
>                         synchronizedMap.syncLock.Lock()
>                         channelResultSet.Close()
>                                 channelResultSet.container.delete()
>                                         synchronizedMap.syncLock.Lock()
>
> As you can see synchronizedMap.syncLock.Lock() is called twice without
> unlocking the mutex, which causes the deadlock.
>
>
> I'm not familiar with the code base so bear with me if the information I'm
> providing is not totally accurate.
>
> Thanks for the hard work! So far, the driver looks amazing!
>
>         Roi Martin
>


-- 
*Lyndon Bauto*
Team Lead
Bit Quill Technologies Inc.
lyndonb@bitquilltech.com
https://www.bitquilltech.com