You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@activemq.apache.org by "Gregg Saffell (Created) (JIRA)" <ji...@apache.org> on 2011/10/03 23:12:34 UTC

[jira] [Created] (AMQ-3522) Bridge connection lost, client side tries to reconnect for a while then stops trying (HTTPS through a proxy server)

Bridge connection lost, client side tries to reconnect for a while then stops trying (HTTPS through a proxy server)
-------------------------------------------------------------------------------------------------------------------

                 Key: AMQ-3522
                 URL: https://issues.apache.org/jira/browse/AMQ-3522
             Project: ActiveMQ
          Issue Type: Bug
    Affects Versions: 5.4.2
         Environment: We have a hub-and-spoke bridge architecture where the hub is the client that connects (over HTTPS) to seventy some servers. A proxy server resides at each server node thus the connection is https over a proxy. The client side of the bridge is running on Linux (CentOS), the server side of bridge is running on Windows 2003. The proxy server is Apache HTTP server. All of this runs over a VPN.

The specific amq build we are using is from FUSE, namely, 5.4.2-fuse-03-09.  On the client side, we have amq logging set as follows:

log4j.rootLogger=INFO, console, logfile
log4j.logger.org.apache.activemq.spring=WARN
log4j.logger.org.apache.activemq.web.handler=WARN
log4j.logger.org.springframework=WARN
log4j.logger.org.apache.xbean=WARN
log4j.logger.org.apache.camel=INFO
log4j.logger.org.apache.activemq.network.DiscoveryNetworkConnector=DEBUG
            Reporter: Gregg Saffell
         Attachments: client-activemq.log, server-access.log, server-wrapper.log

Each server node has a queue that is dedicated to messages going from the server to the client. That queue is therefore expected always to have one consumer (the one that does the bridge forwarding). At least two times per week, we detect that one of the servers shows zero consumers on that queue (in which case messages are not being forwarded across the bridge) and this persists until we stop and restart the client amq instance. Examination of the logs shows that the client side detected the connection had been lost and it retried to establish it for a time but repeatedly failed and then seemed to give up. The wrapper.log file on the server side shows "IDLE SCEP@...[d=false,io=1,w=true,rb=false,wb=false]" for the period where the client is not able to re-establish the connection.

The client-side network connector definition is as follows: 

<networkConnector name="ECENTRE_TSOLAR_EU" uri="static:(https://localhost:61617?   readCheckTime=20000&amp;initialDelayTime=4000&amp;keepAliveResponseRequired=true&amp;proxyHost=ecentre.tsolar.eu&amp;proxyPort=80)?useExponentialBackOff=false&amp;initialReconnectDelay=5000&amp;maxReconnectDelay=5000" duplex="true">
    <dynamicallyIncludedDestinations>
        <queue physicalName="TO_EC.ECENTRE_TSOLAR_EU.&gt;"/>
        <queue physicalName="TO_SCA.&gt;"/>
    </dynamicallyIncludedDestinations>
        <staticallyIncludedDestinations>
			<queue physicalName="TO_EC.ECENTRE_TSOLAR_EU.ASYNC_MSGS"/>
			<queue physicalName="TO_EC.ECENTRE_TSOLAR_EU.SYNC_RESP"/>
			<queue physicalName="TO_SCA.ASYNC_MSGS"/>
			<queue physicalName="TO_SCA.ADMIN.SYNC_REQ"/>
			<queue physicalName="TO_SCA.SSO.SYNC_REQ"/>
	</staticallyIncludedDestinations>
</networkConnector>


The server-side transport connector definition is as follows:

        <transportConnectors>
            <transportConnector name="https" uri="https://127.0.0.1:61617?transport.readCheckTime=20000&amp;transport.initialDelayTime=20000&amp;transport.keepAliveResponseRequired=true&amp;needClientAuth=true"/>
        </transportConnectors>

The proxy server resides on the same machine and is configured with the following virtual host:

<VirtualHost _default_:80>
      ErrorLog logs/default80.ecentre_error.log

      ProxyRequests on
      AllowCONNECT 61617
      ProxyVia on
      LogLevel warn

      RewriteEngine on
      RewriteCond %{REQUEST_METHOD} (GET|POST|HEAD|PUT|DELETE)
      RewriteCond %{REQUEST_URI} !^/eCentreToolServices*
      RewriteCond %{REQUEST_URI} !^/eCentreServices*
      RewriteRule ^(.*)$ https://%{SERVER_NAME}$1 [L,R,NC]
      
      #Take out the TRACE and TRACK
      RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)
      RewriteRule .* -[F]

</VirtualHost>

I'm about to attach amq logs from both the client and server along with the apache access log.  The specific remote server that had the problem is ecentre.tsolar.eu. Allow me to direct your attention to some specific entries: 

-------------------
client-activemq.log
-------------------
2011-09-30 07:46:33,357 | WARN -- connection to tsolar shutdown due to a remote error: java.io.IOException: Could not post command: KeepAliveInfo {} due to: java.net.SocketException: Connection reset

(several attempts to re-establish the connection to tsolar)

2011-09-30 07:48:29,566 | INFO -- the last attempt is made to re-establish the connection to tsolar

2011-09-30 14:06:34,816 | INFO -- we restart amq 

2011-09-30 14:07:10,646 | INFO -- connection established to tsolar

-------------------
server-wrapper.log
-------------------
(Note: the server node's clock is around 5 hours 50 minutes ahead of the client's)
The tsolar server 

INFO   | jvm 1    | 2011/09/30 13:38:06 |  WARN | Stopping an existing active duplex connection [Transport Connection to: blockingQueue_8847135] for network connector (ECENTRE_TSOLAR_EU@ID:amq-us01.ilstechnology.net-44428-1317330422396-0:1).

(connection is stopped and re-established several times)

INFO   | jvm 1    | 2011/09/30 13:39:57 |  INFO - Network connection shutdown due to inactivity timeout

(note the series of "IDLE SCEP@... [d=false,io=1,w=true,rb=false,wb=false]" entries here, not sure if they are relevant, they don't appear in activemq.log, only in wrapper.log)

INFO   | jvm 1    | 2011/09/30 19:58:38 |  INFO - connection re-established


------------------------
server apache access log
------------------------
216.244.116.13 - - [30/Sep/2011:13:38:05 +0200] -- begins a series of rapid "CONNECT localhost:61617 HTTP/1.1" 200 entries

216.244.116.13 - - [30/Sep/2011:13:40:04 +0200] -- last in series of rapid "CONNECT localhost:61617 HTTP/1.1" 200 entries

216.244.116.13 - - [30/Sep/2011:19:58:34 +0200] -- the next "CONNECT localhost:61617 HTTP/1.1" 200 entry


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AMQ-3522) Bridge connection lost, client side tries to reconnect for a while then stops trying (HTTPS through a proxy server)

Posted by "Gregg Saffell (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/AMQ-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gregg Saffell updated AMQ-3522:
-------------------------------

    Attachment: client-activemq.log
                server-wrapper.log
                server-access.log
    
> Bridge connection lost, client side tries to reconnect for a while then stops trying (HTTPS through a proxy server)
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-3522
>                 URL: https://issues.apache.org/jira/browse/AMQ-3522
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.4.2
>         Environment: We have a hub-and-spoke bridge architecture where the hub is the client that connects (over HTTPS) to seventy some servers. A proxy server resides at each server node thus the connection is https over a proxy. The client side of the bridge is running on Linux (CentOS), the server side of bridge is running on Windows 2003. The proxy server is Apache HTTP server. All of this runs over a VPN.
> The specific amq build we are using is from FUSE, namely, 5.4.2-fuse-03-09.  On the client side, we have amq logging set as follows:
> log4j.rootLogger=INFO, console, logfile
> log4j.logger.org.apache.activemq.spring=WARN
> log4j.logger.org.apache.activemq.web.handler=WARN
> log4j.logger.org.springframework=WARN
> log4j.logger.org.apache.xbean=WARN
> log4j.logger.org.apache.camel=INFO
> log4j.logger.org.apache.activemq.network.DiscoveryNetworkConnector=DEBUG
>            Reporter: Gregg Saffell
>         Attachments: client-activemq.log, server-access.log, server-wrapper.log
>
>
> Each server node has a queue that is dedicated to messages going from the server to the client. That queue is therefore expected always to have one consumer (the one that does the bridge forwarding). At least two times per week, we detect that one of the servers shows zero consumers on that queue (in which case messages are not being forwarded across the bridge) and this persists until we stop and restart the client amq instance. Examination of the logs shows that the client side detected the connection had been lost and it retried to establish it for a time but repeatedly failed and then seemed to give up. The wrapper.log file on the server side shows "IDLE SCEP@...[d=false,io=1,w=true,rb=false,wb=false]" for the period where the client is not able to re-establish the connection.
> The client-side network connector definition is as follows: 
> <networkConnector name="ECENTRE_TSOLAR_EU" uri="static:(https://localhost:61617?   readCheckTime=20000&amp;initialDelayTime=4000&amp;keepAliveResponseRequired=true&amp;proxyHost=ecentre.tsolar.eu&amp;proxyPort=80)?useExponentialBackOff=false&amp;initialReconnectDelay=5000&amp;maxReconnectDelay=5000" duplex="true">
>     <dynamicallyIncludedDestinations>
>         <queue physicalName="TO_EC.ECENTRE_TSOLAR_EU.&gt;"/>
>         <queue physicalName="TO_SCA.&gt;"/>
>     </dynamicallyIncludedDestinations>
>         <staticallyIncludedDestinations>
> 			<queue physicalName="TO_EC.ECENTRE_TSOLAR_EU.ASYNC_MSGS"/>
> 			<queue physicalName="TO_EC.ECENTRE_TSOLAR_EU.SYNC_RESP"/>
> 			<queue physicalName="TO_SCA.ASYNC_MSGS"/>
> 			<queue physicalName="TO_SCA.ADMIN.SYNC_REQ"/>
> 			<queue physicalName="TO_SCA.SSO.SYNC_REQ"/>
> 	</staticallyIncludedDestinations>
> </networkConnector>
> The server-side transport connector definition is as follows:
>         <transportConnectors>
>             <transportConnector name="https" uri="https://127.0.0.1:61617?transport.readCheckTime=20000&amp;transport.initialDelayTime=20000&amp;transport.keepAliveResponseRequired=true&amp;needClientAuth=true"/>
>         </transportConnectors>
> The proxy server resides on the same machine and is configured with the following virtual host:
> <VirtualHost _default_:80>
>       ErrorLog logs/default80.ecentre_error.log
>       ProxyRequests on
>       AllowCONNECT 61617
>       ProxyVia on
>       LogLevel warn
>       RewriteEngine on
>       RewriteCond %{REQUEST_METHOD} (GET|POST|HEAD|PUT|DELETE)
>       RewriteCond %{REQUEST_URI} !^/eCentreToolServices*
>       RewriteCond %{REQUEST_URI} !^/eCentreServices*
>       RewriteRule ^(.*)$ https://%{SERVER_NAME}$1 [L,R,NC]
>       
>       #Take out the TRACE and TRACK
>       RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)
>       RewriteRule .* -[F]
> </VirtualHost>
> I'm about to attach amq logs from both the client and server along with the apache access log.  The specific remote server that had the problem is ecentre.tsolar.eu. Allow me to direct your attention to some specific entries: 
> -------------------
> client-activemq.log
> -------------------
> 2011-09-30 07:46:33,357 | WARN -- connection to tsolar shutdown due to a remote error: java.io.IOException: Could not post command: KeepAliveInfo {} due to: java.net.SocketException: Connection reset
> (several attempts to re-establish the connection to tsolar)
> 2011-09-30 07:48:29,566 | INFO -- the last attempt is made to re-establish the connection to tsolar
> 2011-09-30 14:06:34,816 | INFO -- we restart amq 
> 2011-09-30 14:07:10,646 | INFO -- connection established to tsolar
> -------------------
> server-wrapper.log
> -------------------
> (Note: the server node's clock is around 5 hours 50 minutes ahead of the client's)
> The tsolar server 
> INFO   | jvm 1    | 2011/09/30 13:38:06 |  WARN | Stopping an existing active duplex connection [Transport Connection to: blockingQueue_8847135] for network connector (ECENTRE_TSOLAR_EU@ID:amq-us01.ilstechnology.net-44428-1317330422396-0:1).
> (connection is stopped and re-established several times)
> INFO   | jvm 1    | 2011/09/30 13:39:57 |  INFO - Network connection shutdown due to inactivity timeout
> (note the series of "IDLE SCEP@... [d=false,io=1,w=true,rb=false,wb=false]" entries here, not sure if they are relevant, they don't appear in activemq.log, only in wrapper.log)
> INFO   | jvm 1    | 2011/09/30 19:58:38 |  INFO - connection re-established
> ------------------------
> server apache access log
> ------------------------
> 216.244.116.13 - - [30/Sep/2011:13:38:05 +0200] -- begins a series of rapid "CONNECT localhost:61617 HTTP/1.1" 200 entries
> 216.244.116.13 - - [30/Sep/2011:13:40:04 +0200] -- last in series of rapid "CONNECT localhost:61617 HTTP/1.1" 200 entries
> 216.244.116.13 - - [30/Sep/2011:19:58:34 +0200] -- the next "CONNECT localhost:61617 HTTP/1.1" 200 entry

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira