You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Jerry Cwiklik (JIRA)" <de...@uima.apache.org> on 2010/11/30 17:38:11 UTC
[jira] Issue Comment Edited: (UIMA-1658) UIMA AS worker does not
respond to client initialize after failover
[ https://issues.apache.org/jira/browse/UIMA-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965276#action_12965276 ]
Jerry Cwiklik edited comment on UIMA-1658 at 11/30/10 11:36 AM:
----------------------------------------------------------------
Jorn, I actually found a work around for the failover problem using AMQ 5.3.2. The problem you are experiencing is in fact caused by AMQ bug. The bug is related to prefetch=0 and failover. As described here:
http://activemq.2283324.n4.nabble.com/jira-Created-AMQ-2877-Failover-and-prefetch-0-can-result-in-hung-consumers-if-the-MessagePull-commant-td2376741.html
such a scenario leads to a consumer hang as you've been noticing.
UIMA AS service by default uses a prefetch=0 for the service input queue. This is done on purpose to enable fair load balancing at a slight cost of throughput. Prefetch=0 means that the broker will not push messages to the Consumer, and instead the Consumer pulls a single message from the broker whenever it is ready. Such behavior facilitates fair load balancing among available service instances. It also reduces memory footprint of UIMA AS service as there are no outstanding messages in AMQ buffers to process.
Soooo, the work around is to simply change the prefetch value in the deployment descriptor to 1 as shown below:
<inputQueue brokerURL="failover:(tcp://x.y.z:portnumber1,tcp://x.y.z:portnumber2)?randomize=false"
endpoint="queue_name"
prefetch="1"/>
BTW, the above AMQ JIRA claims that the problem is fixed in AMQ 5.4.1. I've tried and the failover is *not* working even with prefetch=1. Perhaps something else got broken. I have not spent too much time with 5.4.1 yet and recommend using 5.3.2 for a time being.
was (Author: cwiklik):
Jorn, I actually found a work around for the failover problem. The problem you are experiencing is in fact caused by AMQ bug. The bug is related to prefetch=0 and failover. As described here:
http://activemq.2283324.n4.nabble.com/jira-Created-AMQ-2877-Failover-and-prefetch-0-can-result-in-hung-consumers-if-the-MessagePull-commant-td2376741.html
such a scenario leads to a consumer hang as you've been noticing.
UIMA AS service by default uses a prefetch=0 for the service input queue. This is done on purpose to enable fair load balancing at a slight cost of throughput. Prefetch=0 means that the broker will not push messages to the Consumer, and instead the Consumer pulls a single message from the broker whenever it is ready. Such behavior facilitates fair load balancing among available service instances. It also reduces memory footprint of UIMA AS service as there are no outstanding messages in AMQ buffers to process.
Soooo, the work around is to simply change the prefetch value in the deployment descriptor to 1 as shown below:
<inputQueue brokerURL="failover:(tcp://x.y.z:portnumber1,tcp://x.y.z:portnumber2)?randomize=false"
endpoint="queue_name"
prefetch="1"/>
> UIMA AS worker does not respond to client initialize after failover
> -------------------------------------------------------------------
>
> Key: UIMA-1658
> URL: https://issues.apache.org/jira/browse/UIMA-1658
> Project: UIMA
> Issue Type: Bug
> Components: Async Scaleout
> Affects Versions: 2.3AS
> Environment: Ubuntu 8.10 Server, Java 1.6 and ActiveMQ 5.3.0
> Reporter: Jörn Kottmann
>
> A Pure Master Slave Broker is used to increase availability of the broker.
> More information about it can be found in the activemq documentation:
> http://activemq.apache.org/pure-master-slave.html
> In a test we simulated Master failure through killing the process with kill -9.
> Here is the log output from the worker node:
> INFO FailoverTransport - Successfully connected to tcp://XXX1:61616
> Here I stopped the master broker process with kill -9
> WARN FailoverTransport - Transport failed to tcp://XXX1:61616 , attempting to automatically reconnect due to: java.io.EOFException
> WARN FailoverTransport - Transport failed to tcp://XXX1:61616 , attempting to automatically reconnect due to: java.io.EOFException
> WARN FailoverTransport - Transport failed to tcp://XXX1:61616 , attempting to automatically reconnect due to: java.io.EOFException
> INFO FailoverTransport - Successfully reconnected to tcp://XXX2:61616
> INFO FailoverTransport - Successfully reconnected to tcp://XXX2:61616
> INFO FailoverTransport - Successfully reconnected to tcp://XXX2:61616
> Afterwards the client was restarted but got a time out error during initialize.
> During initializes it sends a message to the worker nodes input queue, but this
> message if never retrieved.
> I used the activemq web interface to get some information about the message:
> Command 2001
> MessageFrom ID:XXXX-51032-1257865414664-0:1:1
> ServerURI failover:(tcp://XXX1:61616,tcp://XXX2:61616)?randomize=false
> MessageType 3000
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.