You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@guacamole.apache.org by "Douglas Heriot (Jira)" <ji...@apache.org> on 2020/03/19 12:22:00 UTC
[jira] [Comment Edited] (GUACAMOLE-919) An I/O error occurred while sending to the backend

    [ https://issues.apache.org/jira/browse/GUACAMOLE-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062493#comment-17062493 ] 

Douglas Heriot edited comment on GUACAMOLE-919 at 3/19/20, 12:21 PM:
---------------------------------------------------------------------

Ok, so today I had some time to look into this again. The only way I've found to reproduce the issue is to wait a few hours from when guacamole was last used by anyone.

 

Running netstat in the guacamole container just after a blocked request shows there's some data in the Send-Q of the Postgres connection that hasn't been acknowledged. Doing a tcpdump confirmed this - the query is sent and then TCP retransmitted as there's no response from the database.

 
{code:java}
# nsenter -t 32264 -n netstat -n
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0     52 172.17.0.9:54414        10.10.0.236:5432       ESTABLISHED
tcp        0      0 172.19.0.3:8080         172.19.0.4:41012        TIME_WAIT
tcp        0      0 172.17.0.9:54436        10.10.0.236:5432       ESTABLISHED
tcp        0      0 172.19.0.3:8080         172.19.0.4:40988        TIME_WAIT
tcp        0      0 172.19.0.3:8080         172.19.0.4:41002        ESTABLISHED
tcp        0      0 172.19.0.3:8080         172.19.0.4:41008        ESTABLISHED
tcp        0      0 172.19.0.3:8080         172.19.0.4:40986        TIME_WAIT
tcp        0      0 172.19.0.3:8080         172.19.0.4:41010        TIME_WAIT{code}
 

I'm guessing that AWS RDS is just dropping long-lived idle connections without closing them? Or it could be some other network configuration issue we need to solve?

 

With this clearer picture of what's happening, I had a closer look at how it's being handled. I've not yet run Guacamole in a debugger, but reading through code I think I've found where this happens in the MyBatis library.

There's a few interesting log messages (from initial post) to show what code path is being taken. The first interesting one is "Testing connection" that happens on the page requests where nothing loads.

Then when Guacamole starts working again, we see log messages like "Execution of ping query 'SELECT 1' failed: An I/O error occurred while sending to the backend".

These seem to come from here:

[https://github.com/mybatis/mybatis-3/blob/cb11469b3118c5d1d0501c489dcfe196d7c8568d/src/main/java/org/apache/ibatis/datasource/pooled/PooledDataSource.java#L555]

 

It looks like when the Postgres connection is returned from the connection pool, if it's old then it is tested with SELECT 1 before its used. In the case that the TCP connection is still in ESTABLISHED state but the database is not acknowledging any TCP segments, we hang here indefinitely. I haven't caught an example of when Guacamole starts working again, but I guess the TCP connections finally are closed or change to a different state where an IO error is actually returned, so that MyBatis can go fetch another connection.

 

How do we fix this? I've got an idea I'm about to try. MyBatis has a setting "defaultStatementTimeout" that defaults to null (no timeout).

[https://mybatis.org/mybatis-3/configuration.html]

I think this can be set with the "mybatis.configuration.defaultStatementTimeout" configuration setting

[https://github.com/mybatis/guice/blob/master/src/main/java/org/mybatis/guice/configuration/ConfigurationProvider.java#L86]

I'll have a go at adding this in and making it configurable. I've posted on the MyBatis mailing list asking what they recommend here - rather than setting the defaultStatementTimeout that applies to all statements, there's also a "poolTimeToWait" setting that's supposed to be a timeout for getting connections, however it currently only applies to opening new connections, not pinging existing ones.
[https://groups.google.com/forum/#!topic/mybatis-user/o8Ib0SUXOU0]

 

The other thing I'm going to try to resolve this - use pgBouncer as a proxy on the same host as guacamole, to hopefully deal with connections timeout out in a better way. I'm not sure yet whether that will help or how it handles this case.


was (Author: douglasheriot):
Ok, so today I had some time to look into this again. The only way I've found to reproduce the issue is to wait a few hours from when guacamole was last used by anyone.

 

Running netstat in the guacamole container just after a blocked request shows there's some data in the Send-Q of the Postgres connection that hasn't been acknowledged. Doing a tcpdump confirmed this - the query is sent and then TCP retransmitted as there's no response from the database.

 
{code:java}
# nsenter -t 32264 -n netstat -n
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0     52 172.17.0.9:54414        10.10.0.236:5432       ESTABLISHED
tcp        0      0 172.19.0.3:8080         172.19.0.4:41012        TIME_WAIT
tcp        0      0 172.17.0.9:54436        10.10.0.236:5432       ESTABLISHED
tcp        0      0 172.19.0.3:8080         172.19.0.4:40988        TIME_WAIT
tcp        0      0 172.19.0.3:8080         172.19.0.4:41002        ESTABLISHED
tcp        0      0 172.19.0.3:8080         172.19.0.4:41008        ESTABLISHED
tcp        0      0 172.19.0.3:8080         172.19.0.4:40986        TIME_WAIT
tcp        0      0 172.19.0.3:8080         172.19.0.4:41010        TIME_WAIT{code}
 

I'm guessing that AWS RDS is just dropping long-lived idle connections without closing them? Or it could be some other network configuration issue we need to solve?

 

With this clearer picture of what's happening, I had a closer look at how it's being handled. I've not yet run Guacamole in a debugger, but reading through code I think I've found where this happens in the MyBatis library.

There's a few interesting log messages (from initial post) to show what code path is being taken. The first interesting one is "Testing connection" that happens on the page requests where nothing loads.

Then when Guacamole starts working again, we see log messages like "Execution of ping query 'SELECT 1' failed: An I/O error occurred while sending to the backend".

These seem to come from here:

[https://github.com/mybatis/mybatis-3/blob/cb11469b3118c5d1d0501c489dcfe196d7c8568d/src/main/java/org/apache/ibatis/datasource/pooled/PooledDataSource.java#L555]

 

It looks like when the Postgres connection is returned from the connection pool, if it's old then it is tested with SELECT 1 before its used. In the case that the TCP connection is still in ESTABLISHED state but the database is not acknowledging any TCP segments, we hang here indefinitely. I haven't caught an example of when Guacamole starts working again, but I guess the TCP connections finally are closed or change to a different state where an IO error is actually returned, so that MyBatis can go fetch another connection.

 

How do we fix this? I've got an idea I'm about to try. MyBatis has a setting "defaultStatementTimeout" that defaults to null (no timeout).

[https://mybatis.org/mybatis-3/configuration.html]

I think this can be set with the "mybatis.configuration.defaultStatementTimeout" configuration setting

[https://github.com/mybatis/guice/blob/master/src/main/java/org/mybatis/guice/configuration/ConfigurationProvider.java#L86]

I'll have a go at adding this in and making it configurable. I'll also post on the MyBatis mailing list asking what they recommend here - rather than setting the defaultStatementTimeout that applies to all statements, there's also a "poolTimeToWait" setting that's supposed to be a timeout for getting connections, however it currently only applies to opening new connections, not pinging existing ones.

 

The other thing I'm going to try to resolve this - use pgBouncer as a proxy on the same host as guacamole, to hopefully deal with connections timeout out in a better way. I'm not sure yet whether that will help or how it handles this case.

> An I/O error occurred while sending to the backend
> --------------------------------------------------
>
>                 Key: GUACAMOLE-919
>                 URL: https://issues.apache.org/jira/browse/GUACAMOLE-919
>             Project: Guacamole
>          Issue Type: Bug
>          Components: guacamole-auth-jdbc-postgresql
>    Affects Versions: 1.0.0
>            Reporter: Mechanix
>            Assignee: Nick Couchman
>            Priority: Minor
>         Attachments: image-2020-01-27-15-19-26-634.png
>
>
> Hi,
> we use guacamole with postgresql and openid extension. Guacamole and guacd is deployed inside a k8s cluster.
> For some reason, the authentication doesn't succeed sporadically; there is only a blank page and this error message in the guacamole log:
> *[pool-1-thread-1] WARN o.a.i.d.pooled.PooledDataSource - Execution of ping query 'SELECT 1' failed: An I/O error occurred while sending to the backend.*
> I suspect there is a weird timeout happening between guacamole and postgresql but could figure out why.
> Any hints are much appreciated. Thanks
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)