You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@activemq.apache.org by "iiliev2 (via GitHub)" <gi...@apache.org> on 2024/04/22 16:58:26 UTC

[PR] ARTEMIS-4305 Zero persistence does not work in kubernetes [activemq-artemis]

iiliev2 opened a new pull request, #4899:
URL: https://github.com/apache/activemq-artemis/pull/4899

   In a cluster deployed in kubernetes, when a node is destroyed it terminates the process and shuts down the network before the process has a chance to close connections. Then a new node might be brought up, reusing the old node’s ip. If this happens before the connection ttl, from artemis’ point of view, it looks like as if the connection came back. Yet it is actually not the same, the peer has a new node id, etc. This messes things up with the cluster, the old message flow record is invalid.
   
   This also solves another similar issue - if a node goes down and a new one comes in with a new nodeUUID and the same IP before the cluster connections in the others timeout, it would cause them to get stuck and list both the old and the new nodes in their topologies.
   
   The changes are grouped in tightly related incremental commits to make it easier to understand what is changed:
   
   1. `Ping` packets include `nodeUUID`
   2. Acceptors and connectors carry `TransportConfiguration`
   3. `RemotingConnectionImpl#doBufferReceived` tracks for ping nodeUUID mismatch with the target to flag it as `unhealthy`;  `ClientSessionFactoryImpl` destroys unhealthy connections(in addition to not receiving any data on time)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ARTEMIS-4305 Zero persistence does not work in kubernetes [activemq-artemis]

Posted by "iiliev2 (via GitHub)" <gi...@apache.org>.
iiliev2 commented on PR #4899:
URL: https://github.com/apache/activemq-artemis/pull/4899#issuecomment-2078890094

   Initially I attempted what you suggest about lazy initializing the node id like that, precicely because I wanted to keep the code changes to a minimum. However, that ended up being much more complicated(rather than simplified), because of the way `ClientSessionFactoryImpl` creates a new connection object on re-connects. It is very hard to reason about both when reading the code and when needing to debug it at runtime. So instead of this, I had to fill the missing gaps to use the data that is already there anyway, just wasn't being propagated deep enough.
   
   IMO from a functional standpoint, adding the `TransportConfiguration` to the connector(and connection) is the right thing to do here anyway. I assume due to historical reasons, those classes were working with a subset of the data, and no one had a good reason to fix this until now. For example `NettyConnection#getConnectorConfig` was creating a bogus transport configuration, even though when it was created there was a configuration which was not being passed to it.
   
   `Ping` is already the only `Packet` that is being modified. Why do you want to use a raw `byte[]` rather than `UUID`? IMO that will be more confusing - it suggests that there could be other kind of data that can be contained.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ARTEMIS-4305 Zero persistence does not work in kubernetes [activemq-artemis]

Posted by "jbertram (via GitHub)" <gi...@apache.org>.
jbertram commented on PR #4899:
URL: https://github.com/apache/activemq-artemis/pull/4899#issuecomment-2078606956

   I think you could simplify this quite a bit. Here's what I suggest...
   
   - Don't modify any packet aside from `Ping` and only modify it with a new `byte[]`.
   - The broker should send its node ID in every `Ping`.
   - The first time the client receives a `Ping` it should save the node ID.
   - If a client ever receives a `Ping` that is different from the one it has saved then it should disconnect.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org