You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by David Martin <da...@qoritek.com> on 2021/03/11 13:52:54 UTC

Local resilience for Artemis

Hi,

Looking to host an Artemis cluster in Kubernetes and am not sure how to
achieve full local resilience.  (Clusters for DR and remote distribution
will be added later using the mirroring feature introduced with v2.16).

It is configured as 3 active cluster members using static discovery because
the particular cloud provider does not officially support UDP on its
managed Kubernetes service network.

There are no backup brokers (active/passive) because the stateful set takes
care of restarting failed pods immediately.

Each broker has its own networked storage so is resilient in terms of local
state.

Message redistribution is ON_DEMAND. Publishing is to topics and consuming
is from durable topic subscription queues.

Publishers and consumers are connecting round-robin with client IP
affinity/stickiness.

What I'm concerned about is the possibility of journal corruption on one
broker. Publishers and consumers will failover to either of the remaining 2
brokers which is fine but some data could be lost permanently as follows.

Hypothetically, consider that Publisher 1 is publishing to Broker 1 and
Publisher 2 is publishing to Broker 3. Consumer 1 is consuming from Broker
2 and Consumer 2 is consuming from Broker 1.   There are more consumers and
publishers but using 2 of each just to illustrate.

Publisher 1 -> Broker 1 -> Broker 2 -> Consumer 1
Publisher 2 -> Broker 3 -> Broker 2 -> Consumer 1
Publisher 1 -> Broker 1 -> Consumer 2
Publisher 2 -> Broker 3 -> Broker 1 -> Consumer 2

This all works very well with full data integrity and good performance :)

However if say Broker 1's journal got corrupted and it went down
permanently as a result, any data from Publisher 1 which hadn't yet been
distributed to Consumer 1 (via Broker 2) or *particularly* Consumer 2
(directly) would be lost (unless the journal could be recovered).

Is there some straightforward configuration to avoid or reduce this
possibility? Perhaps a 4 broker cluster could have affinity for publishers
on 2 brokers and affinity for consumers on the other 2, somehow?


Thanks for any advice you can offer.


Dave Martin.

Re: Local resilience for Artemis

Posted by David Martin <da...@qoritek.com>.

Sorry, just to add. I could create a kubernetes service for publishers with
affinity to 2 of 4 brokers and another service for consumers with affinity
to the other 2 but looking for something more dynamic if possible, to be
able to scale out seamlessly.


On Thu, 11 Mar 2021 at 13:52, David Martin <da...@qoritek.com> wrote:

> Hi,
>
> Looking to host an Artemis cluster in Kubernetes and am not sure how to
> achieve full local resilience.  (Clusters for DR and remote distribution
> will be added later using the mirroring feature introduced with v2.16).
>
> It is configured as 3 active cluster members using static discovery
> because the particular cloud provider does not officially support UDP on
> its managed Kubernetes service network.
>
> There are no backup brokers (active/passive) because the stateful set
> takes care of restarting failed pods immediately.
>
> Each broker has its own networked storage so is resilient in terms of
> local state.
>
> Message redistribution is ON_DEMAND. Publishing is to topics and consuming
> is from durable topic subscription queues.
>
> Publishers and consumers are connecting round-robin with client IP
> affinity/stickiness.
>
> What I'm concerned about is the possibility of journal corruption on one
> broker. Publishers and consumers will failover to either of the remaining 2
> brokers which is fine but some data could be lost permanently as follows.
>
> Hypothetically, consider that Publisher 1 is publishing to Broker 1 and
> Publisher 2 is publishing to Broker 3. Consumer 1 is consuming from Broker
> 2 and Consumer 2 is consuming from Broker 1.   There are more consumers and
> publishers but using 2 of each just to illustrate.
>
> Publisher 1 -> Broker 1 -> Broker 2 -> Consumer 1
> Publisher 2 -> Broker 3 -> Broker 2 -> Consumer 1
> Publisher 1 -> Broker 1 -> Consumer 2
> Publisher 2 -> Broker 3 -> Broker 1 -> Consumer 2
>
> This all works very well with full data integrity and good performance :)
>
> However if say Broker 1's journal got corrupted and it went down
> permanently as a result, any data from Publisher 1 which hadn't yet been
> distributed to Consumer 1 (via Broker 2) or *particularly* Consumer 2
> (directly) would be lost (unless the journal could be recovered).
>
> Is there some straightforward configuration to avoid or reduce this
> possibility? Perhaps a 4 broker cluster could have affinity for publishers
> on 2 brokers and affinity for consumers on the other 2, somehow?
>
>
> Thanks for any advice you can offer.
>
>
> Dave Martin.
>
>
>

Re: Local resilience for Artemis

Posted by Clebert Suconic <cl...@gmail.com>.

@David please reach to me/us if you hit any issues on the new
functionality.

On Thu, Mar 11, 2021 at 5:12 PM David Martin <da...@qoritek.com> wrote:

> Many thanks for the advice Clebert.
>
> I've just had to deal with journal corruption headaches with other
> messaging middleware in the past.
>
> It does seem like an edge case for Artemis and with mirroring now available
> I'll prioritise the DR solution. AMQP is already the protocol used
> throughout.
>
>
> Dave
>
>
> On Thu, Mar 11, 2021, 9:48 PM Clebert Suconic, <cl...@gmail.com>
> wrote:
>
> > If you are that concerned with losing the journal (which I believe it
> > would be pretty hard to happen),I would recommend you using the
> > Mirror.
> >
> > Note: The Mirror is sending the message as AMQP. so if you send Core,
> > the message will be converted as AMQP through the wire. (AMQP
> > Connection).
> >
> > I have been thinking to embed CoreMessage as AMQP. It would have still
> > some inefficiency crossing the protocol, but it would avoid conversion
> > issues.
> >
> > On Thu, Mar 11, 2021 at 1:31 PM Clebert Suconic
> > <cl...@gmail.com> wrote:
> > >
> > > The journal getting corrupted could happen in 2 situations:
> > >
> > > - the file system is damaged by the infra structure. (Hardware
> failures,
> > kernel issues ...   etc)
> > > * if you have a reliable file system here.  I’m not sure how concerned
> > you should be.
> > >
> > > - some invalid data in the journal making the broker to fail upon
> > restart.
> > >
> > > I have seen only a handful issues raised like this and as any bug we
> fix
> > them when reported.  I am not aware of any at the moment.
> > >
> > >
> > > So I think it would be considerable safe to do reconnect the POD.
> > >
> > > So a damage in the file system or journal after a failure is IMO a
> > disaster situation. And for that I can only think of the mirror to
> mitigate
> > any of that.
> > >
> > > On Thu, Mar 11, 2021 at 8:53 AM David Martin <da...@qoritek.com>
> wrote:
> > >>
> > >> Hi,
> > >>
> > >> Looking to host an Artemis cluster in Kubernetes and am not sure how
> to
> > >> achieve full local resilience.  (Clusters for DR and remote
> distribution
> > >> will be added later using the mirroring feature introduced with
> v2.16).
> > >>
> > >> It is configured as 3 active cluster members using static discovery
> > because
> > >> the particular cloud provider does not officially support UDP on its
> > >> managed Kubernetes service network.
> > >>
> > >> There are no backup brokers (active/passive) because the stateful set
> > takes
> > >> care of restarting failed pods immediately.
> > >>
> > >> Each broker has its own networked storage so is resilient in terms of
> > local
> > >> state.
> > >>
> > >> Message redistribution is ON_DEMAND. Publishing is to topics and
> > consuming
> > >> is from durable topic subscription queues.
> > >>
> > >> Publishers and consumers are connecting round-robin with client IP
> > >> affinity/stickiness.
> > >>
> > >> What I'm concerned about is the possibility of journal corruption on
> one
> > >> broker. Publishers and consumers will failover to either of the
> > remaining 2
> > >> brokers which is fine but some data could be lost permanently as
> > follows.
> > >>
> > >> Hypothetically, consider that Publisher 1 is publishing to Broker 1
> and
> > >> Publisher 2 is publishing to Broker 3. Consumer 1 is consuming from
> > Broker
> > >> 2 and Consumer 2 is consuming from Broker 1.   There are more
> consumers
> > and
> > >> publishers but using 2 of each just to illustrate.
> > >>
> > >> Publisher 1 -> Broker 1 -> Broker 2 -> Consumer 1
> > >> Publisher 2 -> Broker 3 -> Broker 2 -> Consumer 1
> > >> Publisher 1 -> Broker 1 -> Consumer 2
> > >> Publisher 2 -> Broker 3 -> Broker 1 -> Consumer 2
> > >>
> > >> This all works very well with full data integrity and good performance
> > :)
> > >>
> > >> However if say Broker 1's journal got corrupted and it went down
> > >> permanently as a result, any data from Publisher 1 which hadn't yet
> been
> > >> distributed to Consumer 1 (via Broker 2) or *particularly* Consumer 2
> > >> (directly) would be lost (unless the journal could be recovered).
> > >>
> > >> Is there some straightforward configuration to avoid or reduce this
> > >> possibility? Perhaps a 4 broker cluster could have affinity for
> > publishers
> > >> on 2 brokers and affinity for consumers on the other 2, somehow?
> > >>
> > >>
> > >> Thanks for any advice you can offer.
> > >>
> > >>
> > >> Dave Martin.
> > >
> > > --
> > > Clebert Suconic
> >
> >
> >
> > --
> > Clebert Suconic
> >
>
-- 
Clebert Suconic

Re: Local resilience for Artemis

Posted by David Martin <da...@qoritek.com>.

Many thanks for the advice Clebert.

I've just had to deal with journal corruption headaches with other
messaging middleware in the past.

It does seem like an edge case for Artemis and with mirroring now available
I'll prioritise the DR solution. AMQP is already the protocol used
throughout.


Dave


On Thu, Mar 11, 2021, 9:48 PM Clebert Suconic, <cl...@gmail.com>
wrote:

> If you are that concerned with losing the journal (which I believe it
> would be pretty hard to happen),I would recommend you using the
> Mirror.
>
> Note: The Mirror is sending the message as AMQP. so if you send Core,
> the message will be converted as AMQP through the wire. (AMQP
> Connection).
>
> I have been thinking to embed CoreMessage as AMQP. It would have still
> some inefficiency crossing the protocol, but it would avoid conversion
> issues.
>
> On Thu, Mar 11, 2021 at 1:31 PM Clebert Suconic
> <cl...@gmail.com> wrote:
> >
> > The journal getting corrupted could happen in 2 situations:
> >
> > - the file system is damaged by the infra structure. (Hardware failures,
> kernel issues ...   etc)
> > * if you have a reliable file system here.  I’m not sure how concerned
> you should be.
> >
> > - some invalid data in the journal making the broker to fail upon
> restart.
> >
> > I have seen only a handful issues raised like this and as any bug we fix
> them when reported.  I am not aware of any at the moment.
> >
> >
> > So I think it would be considerable safe to do reconnect the POD.
> >
> > So a damage in the file system or journal after a failure is IMO a
> disaster situation. And for that I can only think of the mirror to mitigate
> any of that.
> >
> > On Thu, Mar 11, 2021 at 8:53 AM David Martin <da...@qoritek.com> wrote:
> >>
> >> Hi,
> >>
> >> Looking to host an Artemis cluster in Kubernetes and am not sure how to
> >> achieve full local resilience.  (Clusters for DR and remote distribution
> >> will be added later using the mirroring feature introduced with v2.16).
> >>
> >> It is configured as 3 active cluster members using static discovery
> because
> >> the particular cloud provider does not officially support UDP on its
> >> managed Kubernetes service network.
> >>
> >> There are no backup brokers (active/passive) because the stateful set
> takes
> >> care of restarting failed pods immediately.
> >>
> >> Each broker has its own networked storage so is resilient in terms of
> local
> >> state.
> >>
> >> Message redistribution is ON_DEMAND. Publishing is to topics and
> consuming
> >> is from durable topic subscription queues.
> >>
> >> Publishers and consumers are connecting round-robin with client IP
> >> affinity/stickiness.
> >>
> >> What I'm concerned about is the possibility of journal corruption on one
> >> broker. Publishers and consumers will failover to either of the
> remaining 2
> >> brokers which is fine but some data could be lost permanently as
> follows.
> >>
> >> Hypothetically, consider that Publisher 1 is publishing to Broker 1 and
> >> Publisher 2 is publishing to Broker 3. Consumer 1 is consuming from
> Broker
> >> 2 and Consumer 2 is consuming from Broker 1.   There are more consumers
> and
> >> publishers but using 2 of each just to illustrate.
> >>
> >> Publisher 1 -> Broker 1 -> Broker 2 -> Consumer 1
> >> Publisher 2 -> Broker 3 -> Broker 2 -> Consumer 1
> >> Publisher 1 -> Broker 1 -> Consumer 2
> >> Publisher 2 -> Broker 3 -> Broker 1 -> Consumer 2
> >>
> >> This all works very well with full data integrity and good performance
> :)
> >>
> >> However if say Broker 1's journal got corrupted and it went down
> >> permanently as a result, any data from Publisher 1 which hadn't yet been
> >> distributed to Consumer 1 (via Broker 2) or *particularly* Consumer 2
> >> (directly) would be lost (unless the journal could be recovered).
> >>
> >> Is there some straightforward configuration to avoid or reduce this
> >> possibility? Perhaps a 4 broker cluster could have affinity for
> publishers
> >> on 2 brokers and affinity for consumers on the other 2, somehow?
> >>
> >>
> >> Thanks for any advice you can offer.
> >>
> >>
> >> Dave Martin.
> >
> > --
> > Clebert Suconic
>
>
>
> --
> Clebert Suconic
>

Re: Local resilience for Artemis

Posted by Clebert Suconic <cl...@gmail.com>.

If you are that concerned with losing the journal (which I believe it
would be pretty hard to happen),I would recommend you using the
Mirror.

Note: The Mirror is sending the message as AMQP. so if you send Core,
the message will be converted as AMQP through the wire. (AMQP
Connection).

I have been thinking to embed CoreMessage as AMQP. It would have still
some inefficiency crossing the protocol, but it would avoid conversion
issues.

On Thu, Mar 11, 2021 at 1:31 PM Clebert Suconic
<cl...@gmail.com> wrote:
>
> The journal getting corrupted could happen in 2 situations:
>
> - the file system is damaged by the infra structure. (Hardware failures, kernel issues ...   etc)
> * if you have a reliable file system here.  I’m not sure how concerned you should be.
>
> - some invalid data in the journal making the broker to fail upon restart.
>
> I have seen only a handful issues raised like this and as any bug we fix them when reported.  I am not aware of any at the moment.
>
>
> So I think it would be considerable safe to do reconnect the POD.
>
> So a damage in the file system or journal after a failure is IMO a disaster situation. And for that I can only think of the mirror to mitigate any of that.
>
> On Thu, Mar 11, 2021 at 8:53 AM David Martin <da...@qoritek.com> wrote:
>>
>> Hi,
>>
>> Looking to host an Artemis cluster in Kubernetes and am not sure how to
>> achieve full local resilience.  (Clusters for DR and remote distribution
>> will be added later using the mirroring feature introduced with v2.16).
>>
>> It is configured as 3 active cluster members using static discovery because
>> the particular cloud provider does not officially support UDP on its
>> managed Kubernetes service network.
>>
>> There are no backup brokers (active/passive) because the stateful set takes
>> care of restarting failed pods immediately.
>>
>> Each broker has its own networked storage so is resilient in terms of local
>> state.
>>
>> Message redistribution is ON_DEMAND. Publishing is to topics and consuming
>> is from durable topic subscription queues.
>>
>> Publishers and consumers are connecting round-robin with client IP
>> affinity/stickiness.
>>
>> What I'm concerned about is the possibility of journal corruption on one
>> broker. Publishers and consumers will failover to either of the remaining 2
>> brokers which is fine but some data could be lost permanently as follows.
>>
>> Hypothetically, consider that Publisher 1 is publishing to Broker 1 and
>> Publisher 2 is publishing to Broker 3. Consumer 1 is consuming from Broker
>> 2 and Consumer 2 is consuming from Broker 1.   There are more consumers and
>> publishers but using 2 of each just to illustrate.
>>
>> Publisher 1 -> Broker 1 -> Broker 2 -> Consumer 1
>> Publisher 2 -> Broker 3 -> Broker 2 -> Consumer 1
>> Publisher 1 -> Broker 1 -> Consumer 2
>> Publisher 2 -> Broker 3 -> Broker 1 -> Consumer 2
>>
>> This all works very well with full data integrity and good performance :)
>>
>> However if say Broker 1's journal got corrupted and it went down
>> permanently as a result, any data from Publisher 1 which hadn't yet been
>> distributed to Consumer 1 (via Broker 2) or *particularly* Consumer 2
>> (directly) would be lost (unless the journal could be recovered).
>>
>> Is there some straightforward configuration to avoid or reduce this
>> possibility? Perhaps a 4 broker cluster could have affinity for publishers
>> on 2 brokers and affinity for consumers on the other 2, somehow?
>>
>>
>> Thanks for any advice you can offer.
>>
>>
>> Dave Martin.
>
> --
> Clebert Suconic



-- 
Clebert Suconic

Re: Local resilience for Artemis

Posted by Clebert Suconic <cl...@gmail.com>.

The journal getting corrupted could happen in 2 situations:

- the file system is damaged by the infra structure. (Hardware failures,
kernel issues ...   etc)
* if you have a reliable file system here.  I’m not sure how concerned you
should be.

- some invalid data in the journal making the broker to fail upon restart.

I have seen only a handful issues raised like this and as any bug we fix
them when reported.  I am not aware of any at the moment.


So I think it would be considerable safe to do reconnect the POD.

So a damage in the file system or journal after a failure is IMO a disaster
situation. And for that I can only think of the mirror to mitigate any of
that.

On Thu, Mar 11, 2021 at 8:53 AM David Martin <da...@qoritek.com> wrote:

> Hi,
>
> Looking to host an Artemis cluster in Kubernetes and am not sure how to
> achieve full local resilience.  (Clusters for DR and remote distribution
> will be added later using the mirroring feature introduced with v2.16).
>
> It is configured as 3 active cluster members using static discovery because
> the particular cloud provider does not officially support UDP on its
> managed Kubernetes service network.
>
> There are no backup brokers (active/passive) because the stateful set takes
> care of restarting failed pods immediately.
>
> Each broker has its own networked storage so is resilient in terms of local
> state.
>
> Message redistribution is ON_DEMAND. Publishing is to topics and consuming
> is from durable topic subscription queues.
>
> Publishers and consumers are connecting round-robin with client IP
> affinity/stickiness.
>
> What I'm concerned about is the possibility of journal corruption on one
> broker. Publishers and consumers will failover to either of the remaining 2
> brokers which is fine but some data could be lost permanently as follows.
>
> Hypothetically, consider that Publisher 1 is publishing to Broker 1 and
> Publisher 2 is publishing to Broker 3. Consumer 1 is consuming from Broker
> 2 and Consumer 2 is consuming from Broker 1.   There are more consumers and
> publishers but using 2 of each just to illustrate.
>
> Publisher 1 -> Broker 1 -> Broker 2 -> Consumer 1
> Publisher 2 -> Broker 3 -> Broker 2 -> Consumer 1
> Publisher 1 -> Broker 1 -> Consumer 2
> Publisher 2 -> Broker 3 -> Broker 1 -> Consumer 2
>
> This all works very well with full data integrity and good performance :)
>
> However if say Broker 1's journal got corrupted and it went down
> permanently as a result, any data from Publisher 1 which hadn't yet been
> distributed to Consumer 1 (via Broker 2) or *particularly* Consumer 2
> (directly) would be lost (unless the journal could be recovered).
>
> Is there some straightforward configuration to avoid or reduce this
> possibility? Perhaps a 4 broker cluster could have affinity for publishers
> on 2 brokers and affinity for consumers on the other 2, somehow?
>
>
> Thanks for any advice you can offer.
>
>
> Dave Martin.
>
-- 
Clebert Suconic