You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mina.apache.org by Christoph John <ch...@macd.com> on 2016/10/14 21:47:00 UTC

Re: observing hangs after update to MINA 2.0.14

Hi Emmanuel,

I think I have found the problem in our code and it is most likely related to 
https://issues.apache.org/jira/browse/DIRMINA-1025 .
In the code we were calling ioSession.closeNow(). This worked before but due to the changed flushing 
behaviour messages sometimes got lost on disconnect which lead to hangs because expected messages 
were not received.

I have another question regarding this. Here is the code that is called when disconnecting the session:
----------
         // This is primarily to allow logout messages to be sent before
         // closing the socket. Future versions of MINA may have support
         // in close() to force all pending messages to be written before
         // the socket close is performed.
         //
         // Only wait for a limited time since MINA may deadlock
         // in some rare cases where a socket dies in a strange way.
         for (int i = 0; i < 5 && ioSession.getScheduledWriteMessages() > 0; i++) {
             try {
                 Thread.sleep(10L);
             } catch (InterruptedException e) {
                 Thread.currentThread().interrupt();
             }
         }

         // We cannot call join() on the CloseFuture returned
         // by the following call. We are using a minimal
         // threading model and calling join will prevent the
         // close event from being processed by this thread (if
         // this thread is the MINA IO processor thread.
         ioSession.closeOnFlush();
----------

This code was already there when we used MINA 1.x, so I do not know if the for-loop is still 
necessary when calling closeOnFlush() afterwards. It also did not help in our situation when we 
still called closeNow(). So either waiting 5*10 milliseconds was too short or there were no 
scheduled write messages present.

What about the comment about the deadlock? If I understand correctly then I have a probability of 
hitting https://issues.apache.org/jira/browse/DIRMINA-893 ?? So I should probably go for await() or 
await(timeout)?

Thanks in advance for your help.
Best regards,
Chris.


On 26/09/16 11:22, Emmanuel Lécharny wrote:
> Le 26/09/16 à 09:01, Christoph John a écrit :
>> Until now I could not reproduce the problem reliably. However, I think
>> it might not be a "hang" but have to do with the changed behaviour
>> regarding flushing that has been fixed in MINA 2.0.14 (DIRMINA-1025/1021)
>> E.g. the tests wait for a message to arrive that the other side has
>> sent but it doesn't seem to be received.
>>
>> I guess I'll have to review the closeNow/closeOnFlush calls in
>> QuickFIX/J then? Maybe it is only related to the test suite, though.
> I think you sent a test case that reproduce the issue (AFAIR). I ran
> this test and got a hang, if I try to close both ends, that is what I'd
> like to investigate.
>
> Sadly, I have been pretty busy lately, and I had to cut a new release
> 2.0.15 urgently, due to a big bug in SSL, so I didn't had chance to go
> any further. I hope to have time to do that this week.
>

-- 
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:Christoph.John@macd.com
	


http://www.macd.com <http://www.macd.com/>
----------------------------------------------------------------------------------------------------
	
----------------------------------------------------------------------------------------------------
MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
	 Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald
----------------------------------------------------------------------------------------------------
	
----------------------------------------------------------------------------------------------------

take care of the environment - print only if necessary

Re: observing hangs after update to MINA 2.0.14

Posted by Christoph John <ch...@macd.com>.
Hi Emmanuel,

thanks for your input.

Best regards,
Chris.


On 15/10/16 01:10, Emmanuel Lecharny wrote:
> On Fri, Oct 14, 2016 at 11:47 PM, Christoph John <ch...@macd.com>
> wrote:
>
>> Hi Emmanuel,
>>
>> I think I have found the problem in our code and it is most likely related
>> to https://issues.apache.org/jira/browse/DIRMINA-1025 .
>> In the code we were calling ioSession.closeNow(). This worked before but
>> due to the changed flushing behaviour messages sometimes got lost on
>> disconnect which lead to hangs because expected messages were not received.
>>
> closeNow() don't try to flush anything anymore : it kind of 'brutally'
> close the connection. Calling such a method will also make it impossibe to
> receive any message. Is that the expected behaviour for you ?
>
>
>
>> I have another question regarding this. Here is the code that is called
>> when disconnecting the session:
>> ----------
>>          // This is primarily to allow logout messages to be sent before
>>          // closing the socket. Future versions of MINA may have support
>>          // in close() to force all pending messages to be written before
>>          // the socket close is performed.
>>          //
>>          // Only wait for a limited time since MINA may deadlock
>>          // in some rare cases where a socket dies in a strange way.
>>          for (int i = 0; i < 5 && ioSession.getScheduledWriteMessages() >
>> 0; i++) {
>>              try {
>>                  Thread.sleep(10L);
>>              } catch (InterruptedException e) {
>>                  Thread.currentThread().interrupt();
>>              }
>>          }
>>
>>          // We cannot call join() on the CloseFuture returned
>>          // by the following call. We are using a minimal
>>          // threading model and calling join will prevent the
>>          // close event from being processed by this thread (if
>>          // this thread is the MINA IO processor thread.
>>          ioSession.closeOnFlush();
>> ----------
>>
>> This code was already there when we used MINA 1.x, so I do not know if the
>> for-loop is still necessary when calling closeOnFlush() afterwards.
>
> closeOnFlush() is trying to send all the messages in the write queue,
> before actually closing the session. I do think teh loop is useless.
>
>
>
>> It also did not help in our situation when we still called closeNow(). So
>> either waiting 5*10 milliseconds was too short or there were no scheduled
>> write messages present.
>>
>> What about the comment about the deadlock?
>
> My understanding is that the session disconnection was never detected, and
> there were some pending message, which oviously weren't able to be
> transmitted, so the closeOnFlush() call was blocking forever. MINA 2.0.14
> should not behave this way because an exception occuring because the
> session is closed will empty the write queue.
>
>
>> If I understand correctly then I have a probability of hitting
>> https://issues.apache.org/jira/browse/DIRMINA-893 ?? So I should probably
>> go for await() or await(timeout)?
>>
> I don't know. The JIRA you are mentionning is very secific : it's about
> using MINA to implement a proxy, with one connection dying, leving the
> other pending for ever. As this is an application we haven't written, it's
> hard to reproduce it without a clear exemple that we don't have.
>

-- 
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:Christoph.John@macd.com
	


http://www.macd.com <http://www.macd.com/>
----------------------------------------------------------------------------------------------------
	
----------------------------------------------------------------------------------------------------
MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
	 Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald
----------------------------------------------------------------------------------------------------
	
----------------------------------------------------------------------------------------------------

take care of the environment - print only if necessary

Re: observing hangs after update to MINA 2.0.14

Posted by Emmanuel Lecharny <el...@apache.org>.
On Fri, Oct 14, 2016 at 11:47 PM, Christoph John <ch...@macd.com>
wrote:

> Hi Emmanuel,
>
> I think I have found the problem in our code and it is most likely related
> to https://issues.apache.org/jira/browse/DIRMINA-1025 .
> In the code we were calling ioSession.closeNow(). This worked before but
> due to the changed flushing behaviour messages sometimes got lost on
> disconnect which lead to hangs because expected messages were not received.
>

closeNow() don't try to flush anything anymore : it kind of 'brutally'
close the connection. Calling such a method will also make it impossibe to
receive any message. Is that the expected behaviour for you ?



> I have another question regarding this. Here is the code that is called
> when disconnecting the session:
> ----------
>         // This is primarily to allow logout messages to be sent before
>         // closing the socket. Future versions of MINA may have support
>         // in close() to force all pending messages to be written before
>         // the socket close is performed.
>         //
>         // Only wait for a limited time since MINA may deadlock
>         // in some rare cases where a socket dies in a strange way.
>         for (int i = 0; i < 5 && ioSession.getScheduledWriteMessages() >
> 0; i++) {
>             try {
>                 Thread.sleep(10L);
>             } catch (InterruptedException e) {
>                 Thread.currentThread().interrupt();
>             }
>         }
>
>         // We cannot call join() on the CloseFuture returned
>         // by the following call. We are using a minimal
>         // threading model and calling join will prevent the
>         // close event from being processed by this thread (if
>         // this thread is the MINA IO processor thread.
>         ioSession.closeOnFlush();
> ----------
>
> This code was already there when we used MINA 1.x, so I do not know if the
> for-loop is still necessary when calling closeOnFlush() afterwards.


closeOnFlush() is trying to send all the messages in the write queue,
before actually closing the session. I do think teh loop is useless.



> It also did not help in our situation when we still called closeNow(). So
> either waiting 5*10 milliseconds was too short or there were no scheduled
> write messages present.
>
> What about the comment about the deadlock?


My understanding is that the session disconnection was never detected, and
there were some pending message, which oviously weren't able to be
transmitted, so the closeOnFlush() call was blocking forever. MINA 2.0.14
should not behave this way because an exception occuring because the
session is closed will empty the write queue.


> If I understand correctly then I have a probability of hitting
> https://issues.apache.org/jira/browse/DIRMINA-893 ?? So I should probably
> go for await() or await(timeout)?
>

I don't know. The JIRA you are mentionning is very secific : it's about
using MINA to implement a proxy, with one connection dying, leving the
other pending for ever. As this is an application we haven't written, it's
hard to reproduce it without a clear exemple that we don't have.

>
> Thanks in advance for your help.
> Best regards,
> Chris.
>
>
>
> On 26/09/16 11:22, Emmanuel Lécharny wrote:
>
>> Le 26/09/16 à 09:01, Christoph John a écrit :
>>
>>> Until now I could not reproduce the problem reliably. However, I think
>>> it might not be a "hang" but have to do with the changed behaviour
>>> regarding flushing that has been fixed in MINA 2.0.14 (DIRMINA-1025/1021)
>>> E.g. the tests wait for a message to arrive that the other side has
>>> sent but it doesn't seem to be received.
>>>
>>> I guess I'll have to review the closeNow/closeOnFlush calls in
>>> QuickFIX/J then? Maybe it is only related to the test suite, though.
>>>
>> I think you sent a test case that reproduce the issue (AFAIR). I ran
>> this test and got a hang, if I try to close both ends, that is what I'd
>> like to investigate.
>>
>> Sadly, I have been pretty busy lately, and I had to cut a new release
>> 2.0.15 urgently, due to a big bug in SSL, so I didn't had chance to go
>> any further. I hope to have time to do that this week.
>>
>>
> --
> Christoph John
> Development & Support
> Direct: +49 241 557080-28
> Mailto:Christoph.John@macd.com
>
>
>
> http://www.macd.com <http://www.macd.com/>
> ------------------------------------------------------------
> ----------------------------------------
>
> ------------------------------------------------------------
> ----------------------------------------
> MACD GmbH
> Oppenhoffallee 103
> D-52066 Aachen
> Tel: +49 241 557080-0 | Fax: +49 241 557080-10
>          Amtsgericht Aachen: HRB 8151
> Ust.-Id: DE 813021663
>
> Geschäftsführer: George Macdonald
> ------------------------------------------------------------
> ----------------------------------------
>
> ------------------------------------------------------------
> ----------------------------------------
>
> take care of the environment - print only if necessary
>



-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com