You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by andrea bisogno <bi...@hotmail.it> on 2023/12/18 11:32:12 UTC

MQTT stealing link issues with Artemis deployed on Kubernetes in HA

Hi team,

I'm facing some unexpected MQTT stealing link issues with Artemis deployed on Kubernetes in High Availability (i.e. with the broker pods number >= 2).

I've described the test scenario, and the corresponding unexpected behavior, here: https://github.com/artemiscloud/activemq-artemis-operator/discussions/756

Can you help me with this?

Many thanks in advance,


Andrea Bisogno

R: MQTT stealing link issues with Artemis deployed on Kubernetes in HA

Posted by andrea bisogno <bi...@hotmail.it>.
Hi Justin,
many thanks for your reply and sorry for the wrong terminology I used.

Andrea
________________________________
Da: Justin Bertram <jb...@apache.org>
Inviato: lunedì 18 dicembre 2023 20:59
A: users@activemq.apache.org <us...@activemq.apache.org>
Oggetto: Re: MQTT stealing link issues with Artemis deployed on Kubernetes in HA

After looking at this a bit longer I believe I see what's happening.
Section 3.1.4 of the MQTT 5 specification states:

> If the ClientID represents a Client already connected to the Server, the
Server sends a DISCONNECT
> packet to the existing Client with Reason Code of 0x8E (Session taken
over) as described in section 4.13 and MUST
> close the Network Connection of the existing Client.

There's a similar statement in the same section of the 3.1.1 specification.

In a cluster like you have configured this "link stealing" works, and it is
relatively straightforward to implement. The node receiving the connection
simply sends a notification to all the other cluster members about the
client ID and if there is a connection on any of those nodes using that
client ID then it gets closed.

However, you've configured the opposite behavior (i.e.
allowLinkStealing=false) which means that instead of kicking off the
existing client the incoming client is denied. This is actually a much
harder problem to solve in a cluster because it requires not only a
notification about the incoming client but a *response* from all the other
nodes in the cluster indicating whether or not a client with that same
client ID is already connected. This kind of data exchange between nodes is
at odds with the scalability that a cluster is designed to provide and
can't/won't be solved in the same way.

The way to solve this problem is to use a connection router [1] to ensure
that clients using the same ID always get connected to the same node in the
cluster. Based on your description you already see that when the clients
connect to the same node then denying the incoming connection works as
expected. Using a connection router just ensures that happens all the time.
I'll add a note to the documentation to make this more clear.


Justin

[1]
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fconnection-routers.html%23connection-routers&data=05%7C02%7C%7C83c12edd13a74deff69608dc000c5a50%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638385300251629225%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ejIgYiMTnfiHVi%2FfWEipnNShNtJ0CgXGy8oPUynLg9U%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/connection-routers.html#connection-routers>



On Mon, Dec 18, 2023 at 12:01 PM Justin Bertram <jb...@apache.org> wrote:

> Can you work up a reproducer that doesn't involve that operator?
>
> For what it's worth, the terminology you used in your description seems
> fundamentally ambiguous. You talk about "HA mode", "replicas", etc. This
> terminology has a specific meaning in ActiveMQ Artemis and apparently a
> different meaning in Kubernetes. For example, in ActiveMQ Artemis HA is
> supported via active/passive broker pairs. However, from what I can tell,
> HA in Kubernetes is just multiple "pods" running the same configuration -
> something that would generally be referred to just as a "cluster" in
> ActiveMQ Artemis. Therefore, when you use these terms in describing your
> use-case it gets confusing about what the actual broker configuration is -
> which is mainly what we (on this list) care about.
>
>
> Justin
>
> On Mon, Dec 18, 2023 at 5:33 AM andrea bisogno <bi...@hotmail.it>
> wrote:
>
>> Hi team,
>>
>> I'm facing some unexpected MQTT stealing link issues with Artemis
>> deployed on Kubernetes in High Availability (i.e. with the broker pods
>> number >= 2).
>>
>> I've described the test scenario, and the corresponding unexpected
>> behavior, here:
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fartemiscloud%2Factivemq-artemis-operator%2Fdiscussions%2F756&data=05%7C02%7C%7C83c12edd13a74deff69608dc000c5a50%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638385300251629225%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=C%2FQw6MQsGZCC36b7W8qVVR6gkeVeo7nJ0nReqP76Im8%3D&reserved=0<https://github.com/artemiscloud/activemq-artemis-operator/discussions/756>
>>
>> Can you help me with this?
>>
>> Many thanks in advance,
>>
>>
>> Andrea Bisogno
>>
>

Re: MQTT stealing link issues with Artemis deployed on Kubernetes in HA

Posted by Justin Bertram <jb...@apache.org>.
After looking at this a bit longer I believe I see what's happening.
Section 3.1.4 of the MQTT 5 specification states:

> If the ClientID represents a Client already connected to the Server, the
Server sends a DISCONNECT
> packet to the existing Client with Reason Code of 0x8E (Session taken
over) as described in section 4.13 and MUST
> close the Network Connection of the existing Client.

There's a similar statement in the same section of the 3.1.1 specification.

In a cluster like you have configured this "link stealing" works, and it is
relatively straightforward to implement. The node receiving the connection
simply sends a notification to all the other cluster members about the
client ID and if there is a connection on any of those nodes using that
client ID then it gets closed.

However, you've configured the opposite behavior (i.e.
allowLinkStealing=false) which means that instead of kicking off the
existing client the incoming client is denied. This is actually a much
harder problem to solve in a cluster because it requires not only a
notification about the incoming client but a *response* from all the other
nodes in the cluster indicating whether or not a client with that same
client ID is already connected. This kind of data exchange between nodes is
at odds with the scalability that a cluster is designed to provide and
can't/won't be solved in the same way.

The way to solve this problem is to use a connection router [1] to ensure
that clients using the same ID always get connected to the same node in the
cluster. Based on your description you already see that when the clients
connect to the same node then denying the incoming connection works as
expected. Using a connection router just ensures that happens all the time.
I'll add a note to the documentation to make this more clear.


Justin

[1]
https://activemq.apache.org/components/artemis/documentation/latest/connection-routers.html#connection-routers



On Mon, Dec 18, 2023 at 12:01 PM Justin Bertram <jb...@apache.org> wrote:

> Can you work up a reproducer that doesn't involve that operator?
>
> For what it's worth, the terminology you used in your description seems
> fundamentally ambiguous. You talk about "HA mode", "replicas", etc. This
> terminology has a specific meaning in ActiveMQ Artemis and apparently a
> different meaning in Kubernetes. For example, in ActiveMQ Artemis HA is
> supported via active/passive broker pairs. However, from what I can tell,
> HA in Kubernetes is just multiple "pods" running the same configuration -
> something that would generally be referred to just as a "cluster" in
> ActiveMQ Artemis. Therefore, when you use these terms in describing your
> use-case it gets confusing about what the actual broker configuration is -
> which is mainly what we (on this list) care about.
>
>
> Justin
>
> On Mon, Dec 18, 2023 at 5:33 AM andrea bisogno <bi...@hotmail.it>
> wrote:
>
>> Hi team,
>>
>> I'm facing some unexpected MQTT stealing link issues with Artemis
>> deployed on Kubernetes in High Availability (i.e. with the broker pods
>> number >= 2).
>>
>> I've described the test scenario, and the corresponding unexpected
>> behavior, here:
>> https://github.com/artemiscloud/activemq-artemis-operator/discussions/756
>>
>> Can you help me with this?
>>
>> Many thanks in advance,
>>
>>
>> Andrea Bisogno
>>
>

Re: MQTT stealing link issues with Artemis deployed on Kubernetes in HA

Posted by Justin Bertram <jb...@apache.org>.
Can you work up a reproducer that doesn't involve that operator?

For what it's worth, the terminology you used in your description seems
fundamentally ambiguous. You talk about "HA mode", "replicas", etc. This
terminology has a specific meaning in ActiveMQ Artemis and apparently a
different meaning in Kubernetes. For example, in ActiveMQ Artemis HA is
supported via active/passive broker pairs. However, from what I can tell,
HA in Kubernetes is just multiple "pods" running the same configuration -
something that would generally be referred to just as a "cluster" in
ActiveMQ Artemis. Therefore, when you use these terms in describing your
use-case it gets confusing about what the actual broker configuration is -
which is mainly what we (on this list) care about.


Justin

On Mon, Dec 18, 2023 at 5:33 AM andrea bisogno <bi...@hotmail.it> wrote:

> Hi team,
>
> I'm facing some unexpected MQTT stealing link issues with Artemis deployed
> on Kubernetes in High Availability (i.e. with the broker pods number >= 2).
>
> I've described the test scenario, and the corresponding unexpected
> behavior, here:
> https://github.com/artemiscloud/activemq-artemis-operator/discussions/756
>
> Can you help me with this?
>
> Many thanks in advance,
>
>
> Andrea Bisogno
>