You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Stephen Mallette (Jira)" <ji...@apache.org> on 2022/08/08 12:03:00 UTC

[jira] [Updated] (TINKERPOP-2769) gremlin-server does not reply with a timeout response to all timed out requests

     [ https://issues.apache.org/jira/browse/TINKERPOP-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen Mallette updated TINKERPOP-2769:
----------------------------------------
    Component/s: server

> gremlin-server does not reply with a timeout response to all timed out requests
> -------------------------------------------------------------------------------
>
>                 Key: TINKERPOP-2769
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2769
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.6.0, 3.5.3, 3.6.1, 3.5.4
>            Reporter: Roi Martin
>            Assignee: Stephen Mallette
>            Priority: Major
>             Fix For: 3.7.0, 3.6.2, 3.5.5
>
>
> The following reproducer hangs because not all timed out requests are reported by gremlin-server.
> *main.go*
> {code:go}
> package main
> import (
> 	"log"
> 	"sync"
> 	"sync/atomic"
> 	gremlingo "github.com/apache/tinkerpop/gremlin-go/v3/driver"
> )
> const (
> 	gremlinServerEndpoint = "ws://localhost:8182/gremlin"
> 	nWorkers              = 250
> )
> func main() {
> 	conn, err := gremlingo.NewDriverRemoteConnection(gremlinServerEndpoint, func(settings *gremlingo.DriverRemoteConnectionSettings) {
> 		settings.LogVerbosity = gremlingo.Off
> 	})
> 	if err != nil {
> 		log.Fatalf("could not create remote connection: %v", err)
> 	}
> 	defer conn.Close()
> 	g := gremlingo.Traversal_().WithRemote(conn)
> 	var (
> 		wg               sync.WaitGroup
> 		nPendingRequests int32
> 	)
> 	for i := 0; i < nWorkers; i++ {
> 		wg.Add(1)
> 		go func(id int) {
> 			defer wg.Done()
> 			atomic.AddInt32(&nPendingRequests, 1)
> 			result, err := g.
> 				Inject(1).
> 				SideEffect(&gremlingo.Lambda{"Thread.sleep(5_000)", "gremlin-groovy"}).
> 				Next()
> 			npr := atomic.AddInt32(&nPendingRequests, -1)
> 			log.Printf("[%v] result=%v err=%v nPendingRequests=%v", id, result, err, npr)
> 		}(i)
> 	}
> 	log.Print("waiting")
> 	wg.Wait()
> 	log.Print("done")
> }
> {code}
> Steps to reproduce the issue:
> 1. Run gremlin-server:
> {noformat}
> docker run --rm -ti -p 8182:8182 tinkerpop/gremlin-server:3.5.4-SNAPSHOT 2>&1 | tee log
> {noformat}
> 2. Execute the reproducer:
> {noformat}
> go run main.go
> {noformat}
> 3. Wait ~30s for the reproducer to hang.
> All these tests has been performed against the default gremlin-server configuration. It is important to note that in this machine the gremlin pool is set to 8. Higher values may require to adjust the number of workers in the reproducer to trigger the issue.
> After following these steps, the output of the reproducer should look like this:
> {noformat}
> 2022/06/29 16:08:47 waiting
> 2022/06/29 16:08:52 [22] result=result{object=1 class=int64} err=<nil> nPendingRequests=249
> ...
> 2022/06/29 16:09:17 [154] result=result{object=1 class=int64} err=<nil> nPendingRequests=208
> 2022/06/29 16:09:17 [243] result=result{object=1 class=int64} err=<nil> nPendingRequests=207
> 2022/06/29 16:09:17 [226] result=<nil> err=E0903: there are no results left nPendingRequests=206
> 2022/06/29 16:09:17 [164] result=<nil> err=E0903: there are no results left nPendingRequests=204
> ...
> 2022/06/29 16:09:17 [131] result=<nil> err=E0903: there are no results left nPendingRequests=113
> {noformat}
> On the other side, the gremlin-server logs should look like this:
> {noformat}
> [INFO] ? - Channel started at port 8182.
> [WARN] ? - A timeout occurred during traversal evaluation of [RequestMessage{, requestId=9d7ccc61-0e7e-4105-ae35-eb81a3895170, op='bytecode', processor='traversal', args={gremlin=[[], [inject(1), sideEffect(lambda[Thread.sleep(5_000)])]], aliases={g=g}}}] - consider increasing the limit given to evaluationTimeout
> [WARN] ? - A timeout occurred during traversal evaluation of [RequestMessage{, requestId=6f2d939a-31c9-48fd-bb54-b2bfa1fbd4de, op='bytecode', processor='traversal', args={gremlin=[[], [inject(1), sideEffect(lambda[Thread.sleep(5_000)])]], aliases={g=g}}}] - consider increasing the limit given to evaluationTimeout
> ...
> {noformat}
> In this specific case,
>  - 43 requests finished successfully (250-207 = 43)
>  - 94 requests timed out (207-113 = 94)
>  - 113 requests never finished (making the reproducer hang)
> If we check the gremlin-server logs, the number of reported timeouts matches the output of the reproducer, which makes me think that gremlin-server is the culprit.
> {noformat}
> $ grep 'A timeout occurred' log | wc -l
> 94
> {noformat}
> Inspecting the hanging goroutines in the reproducer shows that they are IO waiting:
> {noformat}
> (dlv) grs
> ...
>   Goroutine 373 - User: /home/n/goroot/src/net/fd_posix.go:55 net.(*netFD).Read (0x684273) [IO wait]
> ...
> (dlv) gr 373
> Switched from 0 to 373 (thread 709858)
> (dlv) bt
>  0  0x000000000043d212 in runtime.gopark
>     at /home/n/goroot/src/runtime/proc.go:362
>  1  0x0000000000435a4a in runtime.netpollblock
>     at /home/n/goroot/src/runtime/netpoll.go:522
>  2  0x0000000000465e05 in internal/poll.runtime_pollWait
>     at /home/n/goroot/src/runtime/netpoll.go:302
>  3  0x00000000004ff2c8 in internal/poll.(*pollDesc).wait
>     at /home/n/goroot/src/internal/poll/fd_poll_runtime.go:83
>  4  0x00000000004ff377 in internal/poll.(*pollDesc).waitRead
>     at /home/n/goroot/src/internal/poll/fd_poll_runtime.go:88
>  5  0x0000000000500f8f in internal/poll.(*FD).Read
>     at /home/n/goroot/src/internal/poll/fd_unix.go:167
>  6  0x0000000000684273 in net.(*netFD).Read
>     at /home/n/goroot/src/net/fd_posix.go:55
>  7  0x000000000069ce49 in net.(*conn).Read
>     at /home/n/goroot/src/net/net.go:183
>  8  0x00000000005d9eb4 in bufio.(*Reader).fill
>     at /home/n/goroot/src/bufio/bufio.go:106
>  9  0x00000000005da278 in bufio.(*Reader).Peek
>     at /home/n/goroot/src/bufio/bufio.go:144
> 10  0x00000000008298a9 in github.com/gorilla/websocket.(*Conn).read
>     at /home/n/go/pkg/mod/github.com/gorilla/websocket@v1.5.0/conn.go:371
> 11  0x000000000082d5fd in github.com/gorilla/websocket.(*Conn).advanceFrame
>     at /home/n/go/pkg/mod/github.com/gorilla/websocket@v1.5.0/conn.go:809
> 12  0x000000000082f1d1 in github.com/gorilla/websocket.(*Conn).NextReader
>     at /home/n/go/pkg/mod/github.com/gorilla/websocket@v1.5.0/conn.go:1009
> 13  0x000000000082fccb in github.com/gorilla/websocket.(*Conn).ReadMessage
>     at /home/n/go/pkg/mod/github.com/gorilla/websocket@v1.5.0/conn.go:1093
> 14  0x00000000008418fb in github.com/apache/tinkerpop/gremlin-go/v3/driver.(*gorillaTransporter).Read
>     at /home/n/go/pkg/mod/github.com/apache/tinkerpop/gremlin-go/v3@v3.5.4-rc4.0.20220628161437-04f47d242010/driver/gorillaTransporter.go:116
> 15  0x000000000085a302 in github.com/apache/tinkerpop/gremlin-go/v3/driver.(*gremlinServerWSProtocol).readLoop
>     at /home/n/go/pkg/mod/github.com/apache/tinkerpop/gremlin-go/v3@v3.5.4-rc4.0.20220628161437-04f47d242010/driver/protocol.go:57
> 16  0x000000000085c2e7 in github.com/apache/tinkerpop/gremlin-go/v3/driver.newGremlinServerWSProtocol.func1
>     at /home/n/go/pkg/mod/github.com/apache/tinkerpop/gremlin-go/v3@v3.5.4-rc4.0.20220628161437-04f47d242010/driver/protocol.go:196
> 17  0x000000000046a841 in runtime.goexit
>     at /home/n/goroot/src/runtime/asm_amd64.s:1571
> {noformat}
> The reason why the go program does not finish is that gremlin-server keeps responding to pings, therefore gremlin-go's pong handler keeps extending the read deadline of the connection.
> */tinkerpop/gremlin-go/driver/gorillaTransporter.go*
> {code:go}
> 	transporter.connection.SetPongHandler(func(string) error {
> 		err := transporter.connection.SetReadDeadline(time.Now().Add(2 * transporter.connSettings.keepAliveInterval))
> 		if err != nil {
> 			return err
> 		}
> 		return nil
> 	})
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)