You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by foo bar <st...@gmail.com> on 2021/11/12 16:40:06 UTC

ArtemisMQ 2.15 Messages stuck internal cluster connection queue $artemis.internal...

Hello,

We lost one of the nodes in our cluster. After we recreated it, we noted
that there are cluster connection queues ($artemis.internal queues) from
other nodes in the cluster that have messages
that are stuck. Those cluster connection queues likely point to the old
node which no longer exists. There are zero consumers on these
$artemis.interal queues. I can browse them via the UI.
I can delete from them, but if I execute a retryMessage from the UI it does
nothing. What is the procedure to get these messages to their original
destination and once this
is done, remove the cluster connection queue since artemis seems to have
created new ones for the new node?

Thanks

Re: ArtemisMQ 2.15 Messages stuck internal cluster connection queue $artemis.internal...

Posted by Gary Tully <ga...@gmail.com>.

I would have thought redistributionDelay > 0 would be the answer here,
but I have not verified.

On Fri, 12 Nov 2021 at 16:40, foo bar <st...@gmail.com> wrote:
>
> Hello,
>
> We lost one of the nodes in our cluster. After we recreated it, we noted
> that there are cluster connection queues ($artemis.internal queues) from
> other nodes in the cluster that have messages
> that are stuck. Those cluster connection queues likely point to the old
> node which no longer exists. There are zero consumers on these
> $artemis.interal queues. I can browse them via the UI.
> I can delete from them, but if I execute a retryMessage from the UI it does
> nothing. What is the procedure to get these messages to their original
> destination and once this
> is done, remove the cluster connection queue since artemis seems to have
> created new ones for the new node?
>
> Thanks

Re: ArtemisMQ 2.15 Messages stuck internal cluster connection queue $artemis.internal...

Posted by foo bar <st...@gmail.com>.

I set the reconnect attempts to be a finite value and disabled/enabled the
cluster connection via the management API but nothing happened - the
messages are still stuck there. Anything else we can do?

On Fri, Nov 12, 2021 at 12:24 PM Justin Bertram <jb...@apache.org> wrote:

> Based on your description of the problem it sounds like...
>
>  1) When you recreated your cluster node you didn't restore the journal
> from the node you lost which means the recreated node has a brand new node
> ID.
>  2) You're using <reconnect-attempts>-1</reconnect-attempts> in your
> <cluster-connection>.
>
> Can you confirm this is actually the case? If so, you're seeing the
> expected behavior. As long as one node is attempting to reconnect to
> another node that has dropped out of the cluster it will maintain the
> internal store-and-forward queue for messages designated for the node which
> dropped out of the cluster. As soon as the cluster-connection gives up
> retrying then all the messages in the internal store-and-forward queue will
> be sent back to their original queues.
>
> Therefore, to avoid getting into this situation you should either restore
> the journal from the node that dropped out of the cluster or you should
> configure <reconnect-attempts> to be a finite value and wait for the
> retries to be exhausted.
>
> I'm not sure there is a clean way to recover from this situation after the
> fact. I'll investigate further when I have more time. Here are some ideas
> off the top of my head:
>
>  - Stop the broker, change <reconnect-attempts> to be a finite value, and
> restart.
>  - Stop the cluster connection via the management API and then restart it.
>
>
> Justin
>
> On Fri, Nov 12, 2021 at 10:40 AM foo bar <st...@gmail.com> wrote:
>
> > Hello,
> >
> > We lost one of the nodes in our cluster. After we recreated it, we noted
> > that there are cluster connection queues ($artemis.internal queues) from
> > other nodes in the cluster that have messages
> > that are stuck. Those cluster connection queues likely point to the old
> > node which no longer exists. There are zero consumers on these
> > $artemis.interal queues. I can browse them via the UI.
> > I can delete from them, but if I execute a retryMessage from the UI it
> does
> > nothing. What is the procedure to get these messages to their original
> > destination and once this
> > is done, remove the cluster connection queue since artemis seems to have
> > created new ones for the new node?
> >
> > Thanks
> >
>

Re: ArtemisMQ 2.15 Messages stuck internal cluster connection queue $artemis.internal...

Posted by foo bar <st...@gmail.com>.

I can confirm

1.) we did not restore the journal
2.) we have not specified reconnect-attempts in the cluster connection
which should default to -1 as you noted

If we specify a finite value for reconnect-attempts will this also apply to
the orphaned cluster connection? I ask because when I look at
the logs I don't see it trying to reconnect. The only thing I see is
successful bridge and cluster connections to the new node. If I recall
correctly, I see repeated
entries in the log to reconnect/connect when either a bridge or cluster
member is down.

On Fri, Nov 12, 2021 at 12:24 PM Justin Bertram <jb...@apache.org> wrote:

> Based on your description of the problem it sounds like...
>
>  1) When you recreated your cluster node you didn't restore the journal
> from the node you lost which means the recreated node has a brand new node
> ID.
>  2) You're using <reconnect-attempts>-1</reconnect-attempts> in your
> <cluster-connection>.
>
> Can you confirm this is actually the case? If so, you're seeing the
> expected behavior. As long as one node is attempting to reconnect to
> another node that has dropped out of the cluster it will maintain the
> internal store-and-forward queue for messages designated for the node which
> dropped out of the cluster. As soon as the cluster-connection gives up
> retrying then all the messages in the internal store-and-forward queue will
> be sent back to their original queues.
>
> Therefore, to avoid getting into this situation you should either restore
> the journal from the node that dropped out of the cluster or you should
> configure <reconnect-attempts> to be a finite value and wait for the
> retries to be exhausted.
>
> I'm not sure there is a clean way to recover from this situation after the
> fact. I'll investigate further when I have more time. Here are some ideas
> off the top of my head:
>
>  - Stop the broker, change <reconnect-attempts> to be a finite value, and
> restart.
>  - Stop the cluster connection via the management API and then restart it.
>
>
> Justin
>
> On Fri, Nov 12, 2021 at 10:40 AM foo bar <st...@gmail.com> wrote:
>
> > Hello,
> >
> > We lost one of the nodes in our cluster. After we recreated it, we noted
> > that there are cluster connection queues ($artemis.internal queues) from
> > other nodes in the cluster that have messages
> > that are stuck. Those cluster connection queues likely point to the old
> > node which no longer exists. There are zero consumers on these
> > $artemis.interal queues. I can browse them via the UI.
> > I can delete from them, but if I execute a retryMessage from the UI it
> does
> > nothing. What is the procedure to get these messages to their original
> > destination and once this
> > is done, remove the cluster connection queue since artemis seems to have
> > created new ones for the new node?
> >
> > Thanks
> >
>

Re: ArtemisMQ 2.15 Messages stuck internal cluster connection queue $artemis.internal...

Posted by Justin Bertram <jb...@apache.org>.

Based on your description of the problem it sounds like...

 1) When you recreated your cluster node you didn't restore the journal
from the node you lost which means the recreated node has a brand new node
ID.
 2) You're using <reconnect-attempts>-1</reconnect-attempts> in your
<cluster-connection>.

Can you confirm this is actually the case? If so, you're seeing the
expected behavior. As long as one node is attempting to reconnect to
another node that has dropped out of the cluster it will maintain the
internal store-and-forward queue for messages designated for the node which
dropped out of the cluster. As soon as the cluster-connection gives up
retrying then all the messages in the internal store-and-forward queue will
be sent back to their original queues.

Therefore, to avoid getting into this situation you should either restore
the journal from the node that dropped out of the cluster or you should
configure <reconnect-attempts> to be a finite value and wait for the
retries to be exhausted.

I'm not sure there is a clean way to recover from this situation after the
fact. I'll investigate further when I have more time. Here are some ideas
off the top of my head:

 - Stop the broker, change <reconnect-attempts> to be a finite value, and
restart.
 - Stop the cluster connection via the management API and then restart it.

Justin

On Fri, Nov 12, 2021 at 10:40 AM foo bar <st...@gmail.com> wrote:

> Hello,
>
> We lost one of the nodes in our cluster. After we recreated it, we noted
> that there are cluster connection queues ($artemis.internal queues) from
> other nodes in the cluster that have messages
> that are stuck. Those cluster connection queues likely point to the old
> node which no longer exists. There are zero consumers on these
> $artemis.interal queues. I can browse them via the UI.
> I can delete from them, but if I execute a retryMessage from the UI it does
> nothing. What is the procedure to get these messages to their original
> destination and once this
> is done, remove the cluster connection queue since artemis seems to have
> created new ones for the new node?
>
> Thanks
>