You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Karel Vervaeke (JIRA)" <ji...@apache.org> on 2012/09/11 11:37:07 UTC

[jira] [Created] (AVRO-1154) NettyTransceiver: cancel channelFuture on close

Karel Vervaeke created AVRO-1154:
------------------------------------

             Summary: NettyTransceiver: cancel channelFuture on close
                 Key: AVRO-1154
                 URL: https://issues.apache.org/jira/browse/AVRO-1154
             Project: Avro
          Issue Type: Bug
            Reporter: Karel Vervaeke


See AVRO-747.
When the server is stopped, threads calling writeDataPacket() will hang for 'connectTimeoutMillis'.
If another thread calls disconnect() (with the first thread still waiting), it would be nice if that writeDataPacket() call immediately throw an IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1154) NettyTransceiver: cancel channelFuture on close

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453508#comment-13453508 ] 

Doug Cutting commented on AVRO-1154:
------------------------------------

Can you please add a test that fails without this patch and succeeds with it?

How much stress testing of this have you done?
                
> NettyTransceiver: cancel channelFuture on close
> -----------------------------------------------
>
>                 Key: AVRO-1154
>                 URL: https://issues.apache.org/jira/browse/AVRO-1154
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Karel Vervaeke
>         Attachments: AVRO-1154.patch, AVRO-1154.v2.patch
>
>
> See AVRO-747.
> When the server is stopped, threads calling writeDataPacket() will hang for 'connectTimeoutMillis'.
> If another thread calls disconnect() (with the first thread still waiting), it would be nice if that writeDataPacket() call immediately throw an IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (AVRO-1154) NettyTransceiver: cancel channelFuture on close

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting resolved AVRO-1154.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 1.7.2
         Assignee: Karel Vervaeke

I committed this.  Thanks Karel & Bruno!
                
> NettyTransceiver: cancel channelFuture on close
> -----------------------------------------------
>
>                 Key: AVRO-1154
>                 URL: https://issues.apache.org/jira/browse/AVRO-1154
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Karel Vervaeke
>            Assignee: Karel Vervaeke
>             Fix For: 1.7.2
>
>         Attachments: AVRO-1154.patch, AVRO-1154.v2.patch, testcase.patch.txt
>
>
> See AVRO-747.
> When the server is stopped, threads calling writeDataPacket() will hang for 'connectTimeoutMillis'.
> If another thread calls disconnect() (with the first thread still waiting), it would be nice if that writeDataPacket() call immediately throw an IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1154) NettyTransceiver: cancel channelFuture on close

Posted by "Karel Vervaeke (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452863#comment-13452863 ] 

Karel Vervaeke commented on AVRO-1154:
--------------------------------------

The description is a bit too concise.
NettyTransceiver.writeDataPack() calls getChannel(), which in turn calls channelFuture.awaitUninterruptibly(connectTimeoutMillis).

NettyTransceiver.close() calls disconnect(), which hangs while the first call is waiting until the specified timeout is reached.
The attached patch immediately cancels the channelFuture, so the first clients are no longer blocked.

It's difficult to write a test case for this, since you need a server thread and two client threads, performing operations in the correct order.
                
> NettyTransceiver: cancel channelFuture on close
> -----------------------------------------------
>
>                 Key: AVRO-1154
>                 URL: https://issues.apache.org/jira/browse/AVRO-1154
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Karel Vervaeke
>         Attachments: AVRO-1154.patch
>
>
> See AVRO-747.
> When the server is stopped, threads calling writeDataPacket() will hang for 'connectTimeoutMillis'.
> If another thread calls disconnect() (with the first thread still waiting), it would be nice if that writeDataPacket() call immediately throw an IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1154) NettyTransceiver: cancel channelFuture on close

Posted by "Bruno Dumon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454913#comment-13454913 ] 

Bruno Dumon commented on AVRO-1154:
-----------------------------------

I've added a testcase which shows the problem. See the line I've marked with "Without the patch, this close seems to hang forever"

What basically happens is that when a server is stopped and then restarted, requests don't continue immediately. This is because the client (NettyTransceiver) get stuck on channelFuture.awaitUninterruptibly():

{noformat}
"Thread-65" prio=10 tid=0x00007f2bac67e000 nid=0x7179 in Object.wait() [0x00007f2ad21e0000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000007d8d8dd28> (a org.jboss.netty.channel.DefaultChannelFuture)
        at java.lang.Object.wait(Object.java:443)
        at org.jboss.netty.channel.DefaultChannelFuture.await0(DefaultChannelFuture.java:283)
        - locked <0x00000007d8d8dd28> (a org.jboss.netty.channel.DefaultChannelFuture)
        at org.jboss.netty.channel.DefaultChannelFuture.awaitUninterruptibly(DefaultChannelFuture.java:255)
        at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:247)
        at org.apache.avro.ipc.NettyTransceiver.getRemoteName(NettyTransceiver.java:364)
        at org.apache.avro.ipc.Requestor.writeHandshake(Requestor.java:202)
        at org.apache.avro.ipc.Requestor.access$300(Requestor.java:52)
        at org.apache.avro.ipc.Requestor$Request.getBytes(Requestor.java:478)
        at org.apache.avro.ipc.Requestor.request(Requestor.java:147)
        at org.apache.avro.ipc.Requestor.request(Requestor.java:101)
        at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:72)
        at $Proxy6.send(Unknown Source)
        at org.apache.avro.ipc.TestNettyTransceiverWhenServerStops$1.run(TestNettyTransceiverWhenServerStops.java:45)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

The strange thing is that it blocks even when there is no server available (e.g. try telnet'ting to a port where no daemon is listing: it will fail immediately and not take a minute to decide nobody is listening). This seems like an issue to me, but just to be clear this is not what this patch addresses (there is a commented out section in the patch that covers it).

Now, in our situation, when a server disappears we also get notified about that via ZooKeeper and then decide to close the corresponding NettyTransceiver. However, this call to close also hangs, and that's what the patch on this issue solves:

{noformat}
"main" prio=10 tid=0x00007f2bac008000 nid=0x7126 waiting on condition [0x00007f2bb0aaf000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000783e798e0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
        at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
        at org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:286)
        at org.apache.avro.ipc.NettyTransceiver.close(NettyTransceiver.java:353)
        at org.apache.avro.ipc.TestNettyTransceiverWhenServerStops.testNettyTransceiverWhenServerStops(TestNettyTransceiverWhenServerStops.java:105)
{noformat}

It hangs because it tries to acquire a lock which is already obtained by the above thread that waits on channelFuture.awaitUninterruptibly().

What the patch basically does is cancel this channelFuture, so that the thread calling close can obtain the lock and do its work. Before that, it sets a boolean variable (called stopping) so that no new threads would try to connect to the channel (and cause again the same situation) while we want to stop.
                
> NettyTransceiver: cancel channelFuture on close
> -----------------------------------------------
>
>                 Key: AVRO-1154
>                 URL: https://issues.apache.org/jira/browse/AVRO-1154
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Karel Vervaeke
>         Attachments: AVRO-1154.patch, AVRO-1154.v2.patch, testcase.patch.txt
>
>
> See AVRO-747.
> When the server is stopped, threads calling writeDataPacket() will hang for 'connectTimeoutMillis'.
> If another thread calls disconnect() (with the first thread still waiting), it would be nice if that writeDataPacket() call immediately throw an IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1154) NettyTransceiver: cancel channelFuture on close

Posted by "Bruno Dumon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bruno Dumon updated AVRO-1154:
------------------------------

    Attachment: testcase.patch.txt
    
> NettyTransceiver: cancel channelFuture on close
> -----------------------------------------------
>
>                 Key: AVRO-1154
>                 URL: https://issues.apache.org/jira/browse/AVRO-1154
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Karel Vervaeke
>         Attachments: AVRO-1154.patch, AVRO-1154.v2.patch, testcase.patch.txt
>
>
> See AVRO-747.
> When the server is stopped, threads calling writeDataPacket() will hang for 'connectTimeoutMillis'.
> If another thread calls disconnect() (with the first thread still waiting), it would be nice if that writeDataPacket() call immediately throw an IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1154) NettyTransceiver: cancel channelFuture on close

Posted by "Karel Vervaeke (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karel Vervaeke updated AVRO-1154:
---------------------------------

    Attachment: AVRO-1154.v2.patch
    
> NettyTransceiver: cancel channelFuture on close
> -----------------------------------------------
>
>                 Key: AVRO-1154
>                 URL: https://issues.apache.org/jira/browse/AVRO-1154
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Karel Vervaeke
>         Attachments: AVRO-1154.patch, AVRO-1154.v2.patch
>
>
> See AVRO-747.
> When the server is stopped, threads calling writeDataPacket() will hang for 'connectTimeoutMillis'.
> If another thread calls disconnect() (with the first thread still waiting), it would be nice if that writeDataPacket() call immediately throw an IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1154) NettyTransceiver: cancel channelFuture on close

Posted by "Karel Vervaeke (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karel Vervaeke updated AVRO-1154:
---------------------------------

    Attachment: AVRO-1154.patch

Attached patch calls channelFuture.cancel() on close().
                
> NettyTransceiver: cancel channelFuture on close
> -----------------------------------------------
>
>                 Key: AVRO-1154
>                 URL: https://issues.apache.org/jira/browse/AVRO-1154
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Karel Vervaeke
>         Attachments: AVRO-1154.patch
>
>
> See AVRO-747.
> When the server is stopped, threads calling writeDataPacket() will hang for 'connectTimeoutMillis'.
> If another thread calls disconnect() (with the first thread still waiting), it would be nice if that writeDataPacket() call immediately throw an IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1154) NettyTransceiver: cancel channelFuture on close

Posted by "Karel Vervaeke (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karel Vervaeke updated AVRO-1154:
---------------------------------

    Attachment:     (was: AVRO-1154.v2.patch)
    
> NettyTransceiver: cancel channelFuture on close
> -----------------------------------------------
>
>                 Key: AVRO-1154
>                 URL: https://issues.apache.org/jira/browse/AVRO-1154
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Karel Vervaeke
>         Attachments: AVRO-1154.patch
>
>
> See AVRO-747.
> When the server is stopped, threads calling writeDataPacket() will hang for 'connectTimeoutMillis'.
> If another thread calls disconnect() (with the first thread still waiting), it would be nice if that writeDataPacket() call immediately throw an IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1154) NettyTransceiver: cancel channelFuture on close

Posted by "Karel Vervaeke (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karel Vervaeke updated AVRO-1154:
---------------------------------

    Attachment: AVRO-1154.v2.patch

Updated patch to deal with a few potential concurrency issues.
                
> NettyTransceiver: cancel channelFuture on close
> -----------------------------------------------
>
>                 Key: AVRO-1154
>                 URL: https://issues.apache.org/jira/browse/AVRO-1154
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Karel Vervaeke
>         Attachments: AVRO-1154.patch
>
>
> See AVRO-747.
> When the server is stopped, threads calling writeDataPacket() will hang for 'connectTimeoutMillis'.
> If another thread calls disconnect() (with the first thread still waiting), it would be nice if that writeDataPacket() call immediately throw an IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira