You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mina.apache.org by "Emmanuel Lecharny (JIRA)" <ji...@apache.org> on 2014/09/11 07:56:34 UTC

[jira] [Updated] (DIRMINA-779) SSLHandler can re-order data that it reads

     [ https://issues.apache.org/jira/browse/DIRMINA-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Lecharny updated DIRMINA-779:
--------------------------------------
    Description: 
The code in question is the flushScheduledEvents() method in SSLHandler.java:

{code}
 public void flushScheduledEvents() {
   // Fire events only when no lock is hold for this handler.
   if (Thread.holdsLock(this)) {
       return;
   }

   Event e;
         // We need synchronization here inevitably because filterWrite can be
   // called simultaneously and cause 'bad record MAC' integrity error.
   synchronized (this) {
       while ((e = filterWriteEventQueue.poll()) != null) {
           e.nextFilter.filterWrite(session, (WriteRequest) e.data);
       }
   }

   while ((e = messageReceivedEventQueue.poll()) != null) {
       e.nextFilter.messageReceived(session, e.data);
   }
 }
{code}

 This method is called both by threads which handle writes, and threads that
 handle reads.  Therefore, as the comments suggest, multiple threads may go
 through this code simultaneously.  However, since there is no
 synchronization around processing of the messageReceivedEventQueue, it is
 possible that the received messages will be sent to the next filter out of
 order, should there be more than one message in the queue and a context
 switch happen at the wrong time.

 The bug would manifest in our application as a failure of our protocol layer
 to decode a message, we believe, due to a re-ordering.  It only occurred in
 environments with a large amount of contention and network traffic and when
 using TLS.  The fix I have tested was to move the closing brace of the
 synchronized block to extend to cover both while loops.  I've attached a
 patch representing that change.  Since making that change we have not
 encountered the bug again after about 30 hours of testing and 1.5 TB of
 traffic, whereas before the change we could reproduce it after a few
 minutes.

  was:
The code in question is the flushScheduledEvents() method in SSLHandler.java:

 public void flushScheduledEvents() {
   // Fire events only when no lock is hold for this handler.
   if (Thread.holdsLock(this)) {
       return;
   }

   Event e;
         // We need synchronization here inevitably because filterWrite can be
   // called simultaneously and cause 'bad record MAC' integrity error.
   synchronized (this) {
       while ((e = filterWriteEventQueue.poll()) != null) {
           e.nextFilter.filterWrite(session, (WriteRequest) e.data);
       }
   }

   while ((e = messageReceivedEventQueue.poll()) != null) {
       e.nextFilter.messageReceived(session, e.data);
   }
 }

 This method is called both by threads which handle writes, and threads that
 handle reads.  Therefore, as the comments suggest, multiple threads may go
 through this code simultaneously.  However, since there is no
 synchronization around processing of the messageReceivedEventQueue, it is
 possible that the received messages will be sent to the next filter out of
 order, should there be more than one message in the queue and a context
 switch happen at the wrong time.

 The bug would manifest in our application as a failure of our protocol layer
 to decode a message, we believe, due to a re-ordering.  It only occurred in
 environments with a large amount of contention and network traffic and when
 using TLS.  The fix I have tested was to move the closing brace of the
 synchronized block to extend to cover both while loops.  I've attached a
 patch representing that change.  Since making that change we have not
 encountered the bug again after about 30 hours of testing and 1.5 TB of
 traffic, whereas before the change we could reproduce it after a few
 minutes.


> SSLHandler can re-order data that it reads
> ------------------------------------------
>
>                 Key: DIRMINA-779
>                 URL: https://issues.apache.org/jira/browse/DIRMINA-779
>             Project: MINA
>          Issue Type: Bug
>          Components: Filter
>    Affects Versions: 1.1.7, 2.0.0-M1, 2.0.0-M2, 2.0.0-M3, 2.0.0-M4, 2.0.0-M5, 2.0.0-M6, 2.0.0-RC1
>            Reporter: Jason Resch
>             Fix For: 2.0.8
>
>         Attachments: ssl_reodering_fix.diff
>
>
> The code in question is the flushScheduledEvents() method in SSLHandler.java:
> {code}
>  public void flushScheduledEvents() {
>    // Fire events only when no lock is hold for this handler.
>    if (Thread.holdsLock(this)) {
>        return;
>    }
>    Event e;
>          // We need synchronization here inevitably because filterWrite can be
>    // called simultaneously and cause 'bad record MAC' integrity error.
>    synchronized (this) {
>        while ((e = filterWriteEventQueue.poll()) != null) {
>            e.nextFilter.filterWrite(session, (WriteRequest) e.data);
>        }
>    }
>    while ((e = messageReceivedEventQueue.poll()) != null) {
>        e.nextFilter.messageReceived(session, e.data);
>    }
>  }
> {code}
>  This method is called both by threads which handle writes, and threads that
>  handle reads.  Therefore, as the comments suggest, multiple threads may go
>  through this code simultaneously.  However, since there is no
>  synchronization around processing of the messageReceivedEventQueue, it is
>  possible that the received messages will be sent to the next filter out of
>  order, should there be more than one message in the queue and a context
>  switch happen at the wrong time.
>  The bug would manifest in our application as a failure of our protocol layer
>  to decode a message, we believe, due to a re-ordering.  It only occurred in
>  environments with a large amount of contention and network traffic and when
>  using TLS.  The fix I have tested was to move the closing brace of the
>  synchronized block to extend to cover both while loops.  I've attached a
>  patch representing that change.  Since making that change we have not
>  encountered the bug again after about 30 hours of testing and 1.5 TB of
>  traffic, whereas before the change we could reproduce it after a few
>  minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)