You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by francesco81 <Fr...@bticino.it> on 2017/06/15 10:53:21 UTC

Artemis rejects the clientid after client disconnection

Hi,
we're facing another issue with Artemis.
Often it happens that after a client disconnects not gracefully (could be
for many reasons: energy loss, network loss, etc.), Artemis doesn't accept
anymore a connection with the same client id. This results, of course, in a
situation where the client cannot connect any more (unless I restart
Artemis). I seems like Artemis doesn't "really" disconnect the client.
Inside the log (set on TRACE level) I see all the activities that I expect:
1) client connects
2) client sends and receives messages correctly
3) client makes PINGREQ and receives PINGRESP correctly
4) Artemis checks the TTL of the specific connection id related to that
client
5) When client goes down, after <keepaliveSetAtClientSide> * 1,5 seconds
(which in our case result in 90 seconds) Artemis realizes that the specific
connection id is broken, and it disconnects it.
But at this point the client cannot connect any more.
Every time I try to connect by using that specific client id, it is
rejected. And I have no error inside the log. It seems simply as the broker
"think" that a connection with that client id is still up and running. If I
connect with another client id and try to send a message to a topic where
the broken client id was subscribed to, I see inside the log that Artemis
try to send the message to that client id:
MQTT(<clientid>): OUT >> PUBLISH(1) EXACTLY_ONCE
...but really there's no connection for that client id. Not more, anyway.
Unfortunately I'm not yet able to reproduce the bug sistematically. But I'm
hard working to do it, because it's pretty much blocker to us.

Thanks

Francesco




--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis rejects the clientid after client disconnection

Posted by Justin Bertram <jb...@apache.org>.
This may be related to ARTEMIS-1218 [1].


Justin

[1] https://issues.apache.org/jira/browse/ARTEMIS-1218

----- Original Message -----
From: "francesco81" <Fr...@bticino.it>
To: users@activemq.apache.org
Sent: Thursday, June 15, 2017 6:50:20 AM
Subject: Re: Artemis rejects the clientid after client disconnection

Hi Martyn,
thanks so much for your answer.
I'm using the official version 2.1.0, downloaded from the website
(https://www.apache.org/dyn/closer.cgi?filename=activemq/activemq-artemis/2.1.0/apache-artemis-2.1.0-bin.zip&action=download).
As soon as I'm able to reproduce it in a systematic way, I'll let you know.

Attached there's my broker.xml (maybe can help you).
broker.xml <http://activemq.2283324.n4.nabble.com/file/n4727430/broker.xml>  

Francesco



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4727430.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis rejects the clientid after client disconnection

Posted by Michael André Pearce <mi...@me.com>.
I mean to add the try catch within the runnable run method.

Sent from my iPhone

> On 28 Jun 2017, at 16:48, Michael André Pearce <mi...@me.com> wrote:
> 
> This would suggest an uncaught exception could be killing the executor thread.
> 
> If you have a local build, Can you wrap the conn.fail method with a try catch (throwable t)
> 
> See if that helps at all.
> 
> 
> 
> Sent from my iPhone
> 
>> On 28 Jun 2017, at 16:19, francesco81 <Fr...@bticino.it> wrote:
>> 
>> An update.
>> We have re-build "artemis-commons-2.1.0.jar",
>> "artemis-mqtt-protocol-2.1.0.jar" and "artemis-server-2.1.0.jar" in order to
>> improve the logs. It seems that there is a problem with the thread which has
>> in charge the activities to free the clientid, send lwt message, etc..
>> For a few time all works good. Broken connection are correctly cleaned and I
>> see all the logs related to the clientid cleaning and lwt sending.
>> Then, at a certain point (we've not yet undestood why...) connections are no
>> more cleaned correctly. And there are no more logs about the "cleaner"
>> thread. And there's no error. It seems like the connection.fail method
>> simply is not called.
>> 
>> logger.debug("connection id " + conn.getID() + " calling fail executor
>> thread ");                     flushExecutor.execute(new Runnable() {                                               
>> public void run() {                           logger.debug("starting thread
>> for removing connection "+ pair.getA());                          
>> conn.fail(ActiveMQMessageBundle.BUNDLE.clientExited(conn.getRemoteAddress(),
>> pair.getB()));                           logger.debug("thread for connection
>> "+ pair.getA() + " ended correctly");                        }
>> 
>> ...I can't see the log above. Have you any idea about the cause?
>> 
>> Thanks.
>> 
>> Francesco
>> 
>> 
>> 
>> --
>> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4728012.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis rejects the clientid after client disconnection

Posted by Michael André Pearce <mi...@me.com>.
Supplying/Posting  thread dumps and log files would be advantageous, especially without any systematic way to reproduce via integration test. More info the better :)

Locally I'm struggling to reproduce the issue using your broker xml and killing clients manually, but not able to reproduce so flying blind really. Any of the above would help possible tracking down of the issue.

> On 29 Jun 2017, at 11:29, francesco81 <Fr...@bticino.it> wrote:
> 
> Hi Michael,
> I can confirm that the thread is scheduled, because I see the log
> "connection id 1643635981 calling fail executor thread".
> Then I see "RemotingServiceImpl::removing connection ID 1643635981".
> And then ....nothing more: no trace. Like if the thread never takes in
> charge the effective execution.
> 
> Francesco
> 
> 
> 
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4728051.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis rejects the clientid after client disconnection

Posted by francesco81 <Fr...@bticino.it>.
I'm wondering if it's related to this tkt:
https://issues.apache.org/jira/browse/ARTEMIS-1078
I know actually it should had been fixed with version 2.1.0 (that it's the
version I'm using), but anyway it seems very similar.

Francesco



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4728052.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis rejects the clientid after client disconnection

Posted by francesco81 <Fr...@bticino.it>.
Hi Michael,
I can confirm that the thread is scheduled, because I see the log
"connection id 1643635981 calling fail executor thread".
Then I see "RemotingServiceImpl::removing connection ID 1643635981".
And then ....nothing more: no trace. Like if the thread never takes in
charge the effective execution.

Francesco



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4728051.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis rejects the clientid after client disconnection

Posted by Michael André Pearce <mi...@me.com>.
Also could you do a stack trace once it has failed the other option is the executors thread could blocked/deadlocked.

Sent from my iPhone

> On 28 Jun 2017, at 16:48, Michael André Pearce <mi...@me.com> wrote:
> 
> This would suggest an uncaught exception could be killing the executor thread.
> 
> If you have a local build, Can you wrap the conn.fail method with a try catch (throwable t)
> 
> See if that helps at all.
> 
> 
> 
> Sent from my iPhone
> 
>> On 28 Jun 2017, at 16:19, francesco81 <Fr...@bticino.it> wrote:
>> 
>> An update.
>> We have re-build "artemis-commons-2.1.0.jar",
>> "artemis-mqtt-protocol-2.1.0.jar" and "artemis-server-2.1.0.jar" in order to
>> improve the logs. It seems that there is a problem with the thread which has
>> in charge the activities to free the clientid, send lwt message, etc..
>> For a few time all works good. Broken connection are correctly cleaned and I
>> see all the logs related to the clientid cleaning and lwt sending.
>> Then, at a certain point (we've not yet undestood why...) connections are no
>> more cleaned correctly. And there are no more logs about the "cleaner"
>> thread. And there's no error. It seems like the connection.fail method
>> simply is not called.
>> 
>> logger.debug("connection id " + conn.getID() + " calling fail executor
>> thread ");                     flushExecutor.execute(new Runnable() {                                               
>> public void run() {                           logger.debug("starting thread
>> for removing connection "+ pair.getA());                          
>> conn.fail(ActiveMQMessageBundle.BUNDLE.clientExited(conn.getRemoteAddress(),
>> pair.getB()));                           logger.debug("thread for connection
>> "+ pair.getA() + " ended correctly");                        }
>> 
>> ...I can't see the log above. Have you any idea about the cause?
>> 
>> Thanks.
>> 
>> Francesco
>> 
>> 
>> 
>> --
>> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4728012.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis rejects the clientid after client disconnection

Posted by Michael André Pearce <mi...@me.com>.
This would suggest an uncaught exception could be killing the executor thread.

If you have a local build, Can you wrap the conn.fail method with a try catch (throwable t)

See if that helps at all.



Sent from my iPhone

> On 28 Jun 2017, at 16:19, francesco81 <Fr...@bticino.it> wrote:
> 
> An update.
> We have re-build "artemis-commons-2.1.0.jar",
> "artemis-mqtt-protocol-2.1.0.jar" and "artemis-server-2.1.0.jar" in order to
> improve the logs. It seems that there is a problem with the thread which has
> in charge the activities to free the clientid, send lwt message, etc..
> For a few time all works good. Broken connection are correctly cleaned and I
> see all the logs related to the clientid cleaning and lwt sending.
> Then, at a certain point (we've not yet undestood why...) connections are no
> more cleaned correctly. And there are no more logs about the "cleaner"
> thread. And there's no error. It seems like the connection.fail method
> simply is not called.
> 
> logger.debug("connection id " + conn.getID() + " calling fail executor
> thread ");                     flushExecutor.execute(new Runnable() {                                               
> public void run() {                           logger.debug("starting thread
> for removing connection "+ pair.getA());                          
> conn.fail(ActiveMQMessageBundle.BUNDLE.clientExited(conn.getRemoteAddress(),
> pair.getB()));                           logger.debug("thread for connection
> "+ pair.getA() + " ended correctly");                        }
> 
> ...I can't see the log above. Have you any idea about the cause?
> 
> Thanks.
> 
> Francesco
> 
> 
> 
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4728012.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis rejects the clientid after client disconnection

Posted by francesco81 <Fr...@bticino.it>.
An update.
We have re-build "artemis-commons-2.1.0.jar",
"artemis-mqtt-protocol-2.1.0.jar" and "artemis-server-2.1.0.jar" in order to
improve the logs. It seems that there is a problem with the thread which has
in charge the activities to free the clientid, send lwt message, etc..
For a few time all works good. Broken connection are correctly cleaned and I
see all the logs related to the clientid cleaning and lwt sending.
Then, at a certain point (we've not yet undestood why...) connections are no
more cleaned correctly. And there are no more logs about the "cleaner"
thread. And there's no error. It seems like the connection.fail method
simply is not called.

logger.debug("connection id " + conn.getID() + " calling fail executor
thread ");                     flushExecutor.execute(new Runnable() {                                               
public void run() {                           logger.debug("starting thread
for removing connection "+ pair.getA());                          
conn.fail(ActiveMQMessageBundle.BUNDLE.clientExited(conn.getRemoteAddress(),
pair.getB()));                           logger.debug("thread for connection
"+ pair.getA() + " ended correctly");                        }

...I can't see the log above. Have you any idea about the cause?

Thanks.

Francesco



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4728012.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis rejects the clientid after client disconnection

Posted by francesco81 <Fr...@bticino.it>.
Just to give you all possible information:
when the client goes down (as I wrote above: not gracefully) Artemis does
NOT send any lwt message. Actuallly the client, on connection, indicates an
lwt topic and message. And in case of gracefully disconnection Artemis sends
it... In this case, instead, Artemis does NOT send it. I don't know if it's
cause, consequence or simply part of the problem ...anyway it's related.
Hope can help.

Francesco



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4727444.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis rejects the clientid after client disconnection

Posted by francesco81 <Fr...@bticino.it>.
Hi Martyn,
thanks so much for your answer.
I'm using the official version 2.1.0, downloaded from the website
(https://www.apache.org/dyn/closer.cgi?filename=activemq/activemq-artemis/2.1.0/apache-artemis-2.1.0-bin.zip&action=download).
As soon as I'm able to reproduce it in a systematic way, I'll let you know.

Attached there's my broker.xml (maybe can help you).
broker.xml <http://activemq.2283324.n4.nabble.com/file/n4727430/broker.xml>  

Francesco



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-rejects-the-clientid-after-client-disconnection-tp4727426p4727430.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis rejects the clientid after client disconnection

Posted by Martyn Taylor <mt...@redhat.com>.
Hi Francesco,

What version of Artemis are you using?

From your description it does look like the connection is not cleaned up
after the TTL.  I'll attempt to reproduce locally.

Thanks


On Thu, Jun 15, 2017 at 11:53 AM, francesco81 <Francesco.PADOVANI@bticino.it
> wrote:

> Hi,
> we're facing another issue with Artemis.
> Often it happens that after a client disconnects not gracefully (could be
> for many reasons: energy loss, network loss, etc.), Artemis doesn't accept
> anymore a connection with the same client id. This results, of course, in a
> situation where the client cannot connect any more (unless I restart
> Artemis). I seems like Artemis doesn't "really" disconnect the client.
> Inside the log (set on TRACE level) I see all the activities that I expect:
> 1) client connects
> 2) client sends and receives messages correctly
> 3) client makes PINGREQ and receives PINGRESP correctly
> 4) Artemis checks the TTL of the specific connection id related to that
> client
> 5) When client goes down, after <keepaliveSetAtClientSide> * 1,5 seconds
> (which in our case result in 90 seconds) Artemis realizes that the specific
> connection id is broken, and it disconnects it.
> But at this point the client cannot connect any more.
> Every time I try to connect by using that specific client id, it is
> rejected. And I have no error inside the log. It seems simply as the broker
> "think" that a connection with that client id is still up and running. If I
> connect with another client id and try to send a message to a topic where
> the broken client id was subscribed to, I see inside the log that Artemis
> try to send the message to that client id:
> MQTT(<clientid>): OUT >> PUBLISH(1) EXACTLY_ONCE
> ...but really there's no connection for that client id. Not more, anyway.
> Unfortunately I'm not yet able to reproduce the bug sistematically. But I'm
> hard working to do it, because it's pretty much blocker to us.
>
> Thanks
>
> Francesco
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Artemis-rejects-the-clientid-after-client-
> disconnection-tp4727426.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>