You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Dan Langford <da...@gmail.com> on 2019/03/28 18:56:28 UTC

Re: (Artemis) Lost Messages with Colocated Backup Scale-down Failback

I happen to be working on this issue with Mr. Pyle. Is there anybody out
there successfully working with Artemis colocated backups and scale-down
that are able to recover a failed node successfully, and have the recovered
node reestablish a backup? if so we would love to compare configuration
notes. is this a type of configuration that we should expect to work or are
we possibly mixing together many different and incompatible concepts? would
it be appropriate for us to create a jira issue with the issue we are
seeing?

On Fri, Jan 25, 2019 at 11:54 AM Jason Pyle <se...@gmail.com> wrote:

> I sent this last week via Nabble but I don't think it got mailed out to
> anybody, if it did I apologize for the spam.
>
> We're developing a strategy for backups and HA. Ideally we'd like to use
> colocated backups to ensure data integrity and availability with scale-down
> configured from the slave to the master host.
>
> We ran into an issue when bringing a server back up, consider this
> situation.
>
> Servers 1 and 2 are brought up, and make colocated backups 1b and 2b. 1b
> existing on server 2 and 2b existing on server 1. If I bring server 2
> offline, 2b comes online then scales down into server 1 as intended. When I
> bring server 2 back up, 2b does not failback. This leads to server 2
> starting an infinite vote loop to find another server to create a backup
> for
> it. Since server 1 already possesses backup 2b and is only configured for 1
> backup it will infinitely reply that it does not have space for another
> backup.
>
> In this state, if more messages are sent to server 2 and server 2
> experiences a crash those messages are lost.
>
> I've created an example of this problem based on one of the examples in the
> artemis source here
> https://github.com/SethPyle376/colocated-scaledown-problem
>
> I've tested this situation with both replication and shared-store and the
> problem persists. Any help would be great, we need colocated scaledown
> failback working correctly.
>


-- 
Dan Langford

801.683.0213
http://google.com/profiles/danlangford

Re: (Artemis) Lost Messages with Colocated Backup Scale-down Failback

Posted by Dan Langford <da...@gmail.com>.
Seth Pyle,
for the record I stumbled across another user with the same issue. it looks
like there is already a JIRA issue created:
https://issues.apache.org/jira/browse/ARTEMIS-2165 that issue links to a
couple SO questions around the same space. I will add your example test
case to that jira ticket in case it can be of any help to the devs.

On Thu, Mar 28, 2019 at 12:56 PM Dan Langford <da...@gmail.com> wrote:

> I happen to be working on this issue with Mr. Pyle. Is there anybody out
> there successfully working with Artemis colocated backups and scale-down
> that are able to recover a failed node successfully, and have the recovered
> node reestablish a backup? if so we would love to compare configuration
> notes. is this a type of configuration that we should expect to work or are
> we possibly mixing together many different and incompatible concepts? would
> it be appropriate for us to create a jira issue with the issue we are
> seeing?
>
> On Fri, Jan 25, 2019 at 11:54 AM Jason Pyle <se...@gmail.com> wrote:
>
>> I sent this last week via Nabble but I don't think it got mailed out to
>> anybody, if it did I apologize for the spam.
>>
>> We're developing a strategy for backups and HA. Ideally we'd like to use
>> colocated backups to ensure data integrity and availability with
>> scale-down
>> configured from the slave to the master host.
>>
>> We ran into an issue when bringing a server back up, consider this
>> situation.
>>
>> Servers 1 and 2 are brought up, and make colocated backups 1b and 2b. 1b
>> existing on server 2 and 2b existing on server 1. If I bring server 2
>> offline, 2b comes online then scales down into server 1 as intended. When
>> I
>> bring server 2 back up, 2b does not failback. This leads to server 2
>> starting an infinite vote loop to find another server to create a backup
>> for
>> it. Since server 1 already possesses backup 2b and is only configured for
>> 1
>> backup it will infinitely reply that it does not have space for another
>> backup.
>>
>> In this state, if more messages are sent to server 2 and server 2
>> experiences a crash those messages are lost.
>>
>> I've created an example of this problem based on one of the examples in
>> the
>> artemis source here
>> https://github.com/SethPyle376/colocated-scaledown-problem
>>
>> I've tested this situation with both replication and shared-store and the
>> problem persists. Any help would be great, we need colocated scaledown
>> failback working correctly.
>>
>
>
> --
> Dan Langford
>
> 801.683.0213
> http://google.com/profiles/danlangford
>