You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Vishal K (JIRA)" <ji...@apache.org> on 2010/11/17 17:09:14 UTC

[jira] Created: (ZOOKEEPER-932) Move blocking read/write calls to SendWorker and RecvWorker Threads

Move blocking read/write calls to SendWorker and RecvWorker Threads
-------------------------------------------------------------------

                 Key: ZOOKEEPER-932
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-932
             Project: Zookeeper
          Issue Type: Sub-task
            Reporter: Vishal K
            Assignee: Vishal K
             Fix For: 3.4.0


Copying relevant comments:

Vishal K added a comment - 02/Nov/10 02:09 PM
Hi Flavio,

I have a suggestion for changing the blocking IO code in QuorumCnxManager. It keeps the current code structure and requires a small amount of changes. I am not sure if these comments should go in ZOOKEEPER-901. ZOOKEEPER-901 is probably addressing netty as well. Please feel free to close this JIRA if you intend to make all the changes as a part of ZOOKEEPER-901.

Basically we jusy need to move parts of initiateConnection and receiveConnection to SenderWorker and ReceiveWorker.

A. Current flow for receiving connection:
1. accept connection in Listener.run()
2. receiveConnection()

    * Read remote server's ID
    * Take action based on my ID and remote server's ID (disconnect and reconnect if my ID is > remote server's ID).
    * kill current set of SenderWorker and ReciveWorker threads
    * Start a new pair

B Current flow for initiating connection:
1. In connectOne(), connect if not already connected. else return.
2. send my ID to the remote server
3. if my ID < remote server disconnect and return
4. if my ID > remote server

    * kill current set of SenderWorker and ReceiveWorkter threads for the remote server
    * Start a new pair

Proposed changes:
Move the code that performs any blocking IO in SenderWorker and ReceiveWorker.

A. Proposed flow for receiving connection:
1. accept connection in Listener.run()
2. receiveConnection()

    * kill current set of SenderWorker and ReciveWorker threads
    * Start a new pair

Proposed changed to SenderWorker:

    * Read remote server's ID
    * Take action based on my ID and remote server's ID (disconnect and reconnect if my ID is > remote server's ID).
    * Proceed to normal operation

B Proposed flow for initiating connection:
1. in connectOne(), return if already connected
2. Start a new SenderWorker and ReceiveWorker pair
2. In SenderWorker

    * connect to remote server
    * write my ID
    * if my ID < remote server disconnect and return (shutdown the pair).
    * Proceed to normal operation

Questions:

    * In QuorumCnxManager, is it necessary to kill the current pair and restart a new one every time we receive a connect request?
    * In receiveConnection we may choose to reject an accepted connection if a thread in
      SenderWorker is in the process of connecting. Otherwise a server with ID <
      remote server may keep sending frequent connect request that will result in the
      remote server closing connections for this peer. But I think we add a delay
      before sending notifications, which might be good enough to prevent this
      problem.

Let me know what you think about this. I can also help with the implementation.

Flavio Junqueira added a comment - 03/Nov/10 05:28 PM
Hi Vishal, I like your proposal, it seems reasonable and not difficult to implement.

On your questions:

   1. I don't think it is necessary to kill a pair SenderWorker/RecvWorker every time, and I'd certainly support changing it;
   2. I'm not sure where you're suggesting to introduce a delay. In the FLE code, a server sends a new batch of notifications if it changes its vote or if it times out waiting for a new notification. This timeout value increases over time. I was actually thinking that we should reset the timeout value upon receiving a notification. I think this is a bug....

Given that it is your proposal, I'd be happy to let you take a stab at it and help you out if you need a hand. Does it make sense for you?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-932) Move blocking read/write calls to SendWorker and RecvWorker Threads

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal K updated ZOOKEEPER-932:
-------------------------------

    Attachment: ZOOKEEPER-932.patch

Hi Flavio,

I have attached a diff that moves all blocking read/write calls to
SendWorker and RecvWorker threads. The implementation is pretty much
adheres to what I had proposed.

The basic idea is to perform the logic if determining which connection
to keep (depending on server id) in SenderWorker thread.  Listener
does not perform any IO. Therefore, it is free to accept connections
from other peers. Earlier, if a peer dies after establishing the
connection Listener used to block on a read. Also, toSend() used to
serially connect to remote peers. This used to slow down FLE.

I have done some basic testing of the code. Before I do some more
testing, I wanted to send out the diff and see if you are ok with
these changes.

The patch that I had submitted for ZOOKEEPER-900 eliminates use of
SocketChannel. This attached diff eliminates the need to perform
blocking IO outside of the worker threads.

Let know if you have any comments.

Thanks.
-Vishal

> Move blocking read/write calls to SendWorker and RecvWorker Threads
> -------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-932
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-932
>             Project: Zookeeper
>          Issue Type: Sub-task
>            Reporter: Vishal K
>            Assignee: Vishal K
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-932.patch
>
>
> Copying relevant comments:
> Vishal K added a comment - 02/Nov/10 02:09 PM
> Hi Flavio,
> I have a suggestion for changing the blocking IO code in QuorumCnxManager. It keeps the current code structure and requires a small amount of changes. I am not sure if these comments should go in ZOOKEEPER-901. ZOOKEEPER-901 is probably addressing netty as well. Please feel free to close this JIRA if you intend to make all the changes as a part of ZOOKEEPER-901.
> Basically we jusy need to move parts of initiateConnection and receiveConnection to SenderWorker and ReceiveWorker.
> A. Current flow for receiving connection:
> 1. accept connection in Listener.run()
> 2. receiveConnection()
>     * Read remote server's ID
>     * Take action based on my ID and remote server's ID (disconnect and reconnect if my ID is > remote server's ID).
>     * kill current set of SenderWorker and ReciveWorker threads
>     * Start a new pair
> B Current flow for initiating connection:
> 1. In connectOne(), connect if not already connected. else return.
> 2. send my ID to the remote server
> 3. if my ID < remote server disconnect and return
> 4. if my ID > remote server
>     * kill current set of SenderWorker and ReceiveWorkter threads for the remote server
>     * Start a new pair
> Proposed changes:
> Move the code that performs any blocking IO in SenderWorker and ReceiveWorker.
> A. Proposed flow for receiving connection:
> 1. accept connection in Listener.run()
> 2. receiveConnection()
>     * kill current set of SenderWorker and ReciveWorker threads
>     * Start a new pair
> Proposed changed to SenderWorker:
>     * Read remote server's ID
>     * Take action based on my ID and remote server's ID (disconnect and reconnect if my ID is > remote server's ID).
>     * Proceed to normal operation
> B Proposed flow for initiating connection:
> 1. in connectOne(), return if already connected
> 2. Start a new SenderWorker and ReceiveWorker pair
> 2. In SenderWorker
>     * connect to remote server
>     * write my ID
>     * if my ID < remote server disconnect and return (shutdown the pair).
>     * Proceed to normal operation
> Questions:
>     * In QuorumCnxManager, is it necessary to kill the current pair and restart a new one every time we receive a connect request?
>     * In receiveConnection we may choose to reject an accepted connection if a thread in
>       SenderWorker is in the process of connecting. Otherwise a server with ID <
>       remote server may keep sending frequent connect request that will result in the
>       remote server closing connections for this peer. But I think we add a delay
>       before sending notifications, which might be good enough to prevent this
>       problem.
> Let me know what you think about this. I can also help with the implementation.
> Flavio Junqueira added a comment - 03/Nov/10 05:28 PM
> Hi Vishal, I like your proposal, it seems reasonable and not difficult to implement.
> On your questions:
>    1. I don't think it is necessary to kill a pair SenderWorker/RecvWorker every time, and I'd certainly support changing it;
>    2. I'm not sure where you're suggesting to introduce a delay. In the FLE code, a server sends a new batch of notifications if it changes its vote or if it times out waiting for a new notification. This timeout value increases over time. I was actually thinking that we should reset the timeout value upon receiving a notification. I think this is a bug....
> Given that it is your proposal, I'd be happy to let you take a stab at it and help you out if you need a hand. Does it make sense for you?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.