You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Jerry Cwiklik (JIRA)" <de...@uima.apache.org> on 2010/11/30 17:38:11 UTC
[jira] Issue Comment Edited: (UIMA-1658) UIMA AS worker does not respond to client initialize after failover

    [ https://issues.apache.org/jira/browse/UIMA-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965276#action_12965276 ] 

Jerry Cwiklik edited comment on UIMA-1658 at 11/30/10 11:36 AM:
----------------------------------------------------------------

Jorn, I actually found a work around for the failover problem using AMQ 5.3.2. The problem you are experiencing is in fact caused by AMQ bug. The bug is related to prefetch=0 and failover. As described here:

http://activemq.2283324.n4.nabble.com/jira-Created-AMQ-2877-Failover-and-prefetch-0-can-result-in-hung-consumers-if-the-MessagePull-commant-td2376741.html

such a scenario leads to a consumer hang as you've been noticing. 

UIMA AS service by default uses a prefetch=0 for the service input queue. This is done on purpose to enable fair load balancing at a slight cost of throughput. Prefetch=0 means that the broker will not push messages to the Consumer, and instead the Consumer pulls a single message from the broker  whenever it is ready. Such behavior facilitates fair load balancing among available service instances. It also reduces memory footprint of UIMA AS service as there are no outstanding messages in AMQ buffers to process.

Soooo, the work around is to simply change the prefetch value in the deployment descriptor to 1 as shown below:

<inputQueue brokerURL="failover:(tcp://x.y.z:portnumber1,tcp://x.y.z:portnumber2)?randomize=false"
endpoint="queue_name"
prefetch="1"/>

BTW, the above AMQ JIRA claims that the problem is fixed in AMQ 5.4.1. I've tried and the failover is *not* working even with prefetch=1. Perhaps something else got broken. I have not spent too much time with 5.4.1 yet and recommend using 5.3.2 for a time being.


      was (Author: cwiklik):
    Jorn, I actually found a work around for the failover problem. The problem you are experiencing is in fact caused by AMQ bug. The bug is related to prefetch=0 and failover. As described here:

http://activemq.2283324.n4.nabble.com/jira-Created-AMQ-2877-Failover-and-prefetch-0-can-result-in-hung-consumers-if-the-MessagePull-commant-td2376741.html

such a scenario leads to a consumer hang as you've been noticing. 

UIMA AS service by default uses a prefetch=0 for the service input queue. This is done on purpose to enable fair load balancing at a slight cost of throughput. Prefetch=0 means that the broker will not push messages to the Consumer, and instead the Consumer pulls a single message from the broker  whenever it is ready. Such behavior facilitates fair load balancing among available service instances. It also reduces memory footprint of UIMA AS service as there are no outstanding messages in AMQ buffers to process.

Soooo, the work around is to simply change the prefetch value in the deployment descriptor to 1 as shown below:

<inputQueue brokerURL="failover:(tcp://x.y.z:portnumber1,tcp://x.y.z:portnumber2)?randomize=false"
endpoint="queue_name"
prefetch="1"/>


  
> UIMA AS worker does not respond to client initialize after failover
> -------------------------------------------------------------------
>
>                 Key: UIMA-1658
>                 URL: https://issues.apache.org/jira/browse/UIMA-1658
>             Project: UIMA
>          Issue Type: Bug
>          Components: Async Scaleout
>    Affects Versions: 2.3AS
>         Environment: Ubuntu 8.10 Server, Java 1.6 and ActiveMQ 5.3.0
>            Reporter: Jörn Kottmann
>
> A Pure Master Slave Broker is used to increase availability of the broker. 
> More information about it can be found in the activemq documentation:
> http://activemq.apache.org/pure-master-slave.html 
> In a test we simulated Master failure through killing the process with kill -9.
> Here is the log output from the worker node:
> INFO  FailoverTransport              - Successfully connected to tcp://XXX1:61616
>  Here I stopped the master broker process with kill -9 
> WARN  FailoverTransport              - Transport failed to tcp://XXX1:61616 , attempting to automatically reconnect due to: java.io.EOFException
> WARN  FailoverTransport              - Transport failed to tcp://XXX1:61616 , attempting to automatically reconnect due to: java.io.EOFException
> WARN  FailoverTransport              - Transport failed to tcp://XXX1:61616 , attempting to automatically reconnect due to: java.io.EOFException
> INFO  FailoverTransport              - Successfully reconnected to tcp://XXX2:61616
> INFO  FailoverTransport              - Successfully reconnected to tcp://XXX2:61616
> INFO  FailoverTransport              - Successfully reconnected to tcp://XXX2:61616 
> Afterwards the client was restarted but got a time out error during initialize.
> During initializes it sends a message to the worker nodes input queue, but this
> message if never retrieved.
> I used the activemq web interface to get some information about the message:
> Command     2001
> MessageFrom     ID:XXXX-51032-1257865414664-0:1:1
> ServerURI     failover:(tcp://XXX1:61616,tcp://XXX2:61616)?randomize=false
> MessageType     3000 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.