You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by tedxia <gi...@git.apache.org> on 2014/12/15 11:00:42 UTC

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

GitHub user tedxia opened a pull request:

    https://github.com/apache/storm/pull/349

    STORM-593: remove endpoint-socket-lock for worker-data

    PR for [STORM-593](https://issues.apache.org/jira/browse/STORM-593)
    
    cached-node+port->socket in worker-data is atom, there on need for rwlock endpoint-socket-lock to protect cached-node+port->socket. And after use rwlock, there will be competition between refresh-connections and message send.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tedxia/incubator-storm remove-endpoint-socket-lock

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/349.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #349
    
----
commit 5210fce329571da2b6121c8ba941d3c7774face1
Author: xiajun <xi...@xiaomi.com>
Date:   2014-12-15T09:56:53Z

    remove endpoint-socket-lock in worker.clj

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by d2r <gi...@git.apache.org>.
Github user d2r commented on the pull request:

    https://github.com/apache/storm/pull/349#issuecomment-87769455
  
    > The send code should be updated to fix this.
    
    @nathanmarz, should we also extend the write lock? [comment](https://github.com/apache/storm/pull/349#discussion_r27413373)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by nathanmarz <gi...@git.apache.org>.
Github user nathanmarz commented on the pull request:

    https://github.com/apache/storm/pull/349#issuecomment-87767343
  
    Taking a look at it again, it looks like the introduction of TransferDrainer is a regression. The way that class works doesn't make a whole lot of sense... it has separate add and send methods but those are only ever used in the same method. Looking up the node+port for the task should happen in the read lock as well.
    
    The send code should be updated to fix this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by d2r <gi...@git.apache.org>.
Github user d2r commented on a diff in the pull request:

    https://github.com/apache/storm/pull/349#discussion_r27404108
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
    @@ -335,16 +333,14 @@
             drainer (TransferDrainer.)
             node+port->socket (:cached-node+port->socket worker)
    --- End diff --
    
    There is a write-lock around `:cached-node+port->socket`, but we do not use it here in this function after binding it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by tedxia <gi...@git.apache.org>.
Github user tedxia commented on the pull request:

    https://github.com/apache/storm/pull/349#issuecomment-66996013
  
    The write lock only protect cached-task->node+port, and not protect cached-node+port->socket. 
    @nathanmarz If we want to ensure send never called on a closed connection, should we also protect cached-node+port->socket either?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by nathanmarz <gi...@git.apache.org>.
Github user nathanmarz commented on the pull request:

    https://github.com/apache/storm/pull/349#issuecomment-87742572
  
    Without the r/w lock the following can happen:
    
    1. Sender S grabs a socket to worker A for task B
    2. Refresh connections learns task B is now on worker C, changes the connection, and closes the old socket
    3. Sender S tries to send data on a closed socket
    
    An atom does not protect against this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by d2r <gi...@git.apache.org>.
Github user d2r commented on a diff in the pull request:

    https://github.com/apache/storm/pull/349#discussion_r27404112
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
    @@ -335,16 +333,14 @@
             drainer (TransferDrainer.)
             node+port->socket (:cached-node+port->socket worker)
             task->node+port (:cached-task->node+port worker)
    -        endpoint-socket-lock (:endpoint-socket-lock worker)
             ]
         (disruptor/clojure-handler
           (fn [packets _ batch-end?]
             (.add drainer packets)
             
             (when batch-end?
    -          (read-locked endpoint-socket-lock
    -            (let [node+port->socket @node+port->socket]
    -              (.send drainer node+port->socket)))
    +          (let [node+port->socket @node+port->socket]
    +            (.send drainer node+port->socket))
    --- End diff --
    
    @nathanmarz, I could be missing something, but I do not think we need a read lock here to protect against the send:
    
    There is no corresponding write-lock around `:cached-node+port->socket`.  (I do not think we need one anyway, since it is updated via a `swap!` in `mk-refresh-connections`, and that should be atomic.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by d2r <gi...@git.apache.org>.
Github user d2r commented on a diff in the pull request:

    https://github.com/apache/storm/pull/349#discussion_r27418768
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
    @@ -302,9 +301,8 @@
                                port)
                               ]
                              )))
    -              (write-locked (:endpoint-socket-lock worker)
    -                (reset! (:cached-task->node+port worker)
    -                        (HashMap. my-assignment)))
    --- End diff --
    
    We should not extend the write lock, we should check tasks->node+port in the read lock when we are sending. See [comment](https://github.com/apache/storm/pull/349#issuecomment-87778672)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by nathanmarz <gi...@git.apache.org>.
Github user nathanmarz commented on a diff in the pull request:

    https://github.com/apache/storm/pull/349#discussion_r27413411
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
    @@ -335,16 +333,14 @@
             drainer (TransferDrainer.)
             node+port->socket (:cached-node+port->socket worker)
             task->node+port (:cached-task->node+port worker)
    -        endpoint-socket-lock (:endpoint-socket-lock worker)
             ]
         (disruptor/clojure-handler
           (fn [packets _ batch-end?]
             (.add drainer packets)
             
             (when batch-end?
    -          (read-locked endpoint-socket-lock
    -            (let [node+port->socket @node+port->socket]
    -              (.send drainer node+port->socket)))
    +          (let [node+port->socket @node+port->socket]
    +            (.send drainer node+port->socket))
    --- End diff --
    
    Taking a look at it again, it looks like the introduction of TransferDrainer is a regression. The way that class works doesn't make a whole lot of sense... it has separate add and send methods but those are only ever used in the same method. Looking up the node+port for the task should happen in the read lock as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by d2r <gi...@git.apache.org>.
Github user d2r commented on the pull request:

    https://github.com/apache/storm/pull/349#issuecomment-87789524
  
    Yes, I see before 861a92eab8740cfc0f83ac4d7ade9a2ab04a8b9b we checked task->node+port.
    
    I'll file an issue to track, since it is a different problem/fix than covered by this JIRA issue and pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by d2r <gi...@git.apache.org>.
Github user d2r commented on a diff in the pull request:

    https://github.com/apache/storm/pull/349#discussion_r27413373
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
    @@ -302,9 +301,8 @@
                                port)
                               ]
                              )))
    -              (write-locked (:endpoint-socket-lock worker)
    -                (reset! (:cached-task->node+port worker)
    -                        (HashMap. my-assignment)))
    --- End diff --
    
    OK, I am convinced we do need the lock here.  The lock is not so much protecting `:cached-node+port->socket` as it is guaranteeing (along with the read-lock below) that anyone accessing the atom after we enter this lock will get the data in new HashMap that has been swapped into it.
    
    In fact, it seems there could still be a race we send to an endpoint after it is closed, but before it is removed.  We might want to extend the write lock to protect readers getting a mapping with non-null, closed connections:
    
    ```Diff
                   (write-locked (:endpoint-socket-lock worker)
                     (reset! (:cached-task->node+port worker)
    -                        (HashMap. my-assignment)))
    -              (doseq [endpoint remove-connections]
    -                (.close (get @(:cached-node+port->socket worker) endpoint)))
    -              (apply swap!
    -                     (:cached-node+port->socket worker)
    -                     #(HashMap. (apply dissoc (into {} %1) %&))
    -                     remove-connections)
    +                        (HashMap. my-assignment))
    +                (doseq [endpoint remove-connections]
    +                  (.close (get @(:cached-node+port->socket worker) endpoint)))
    +                (apply swap!
    +                       (:cached-node+port->socket worker)
    +                       #(HashMap. (apply dissoc (into {} %1) %&))
    +                       remove-connections))
    ```
    It is OK if the packets are sent to a null IConnection, since TransferDrainer does a [null check](https://github.com/apache/storm/blob/36e99fa2dfdd13cd43d8fa8c558c670cd7750ed0/storm-core/src/jvm/backtype/storm/utils/TransferDrainer.java#L50).  But it is not OK to try and send with a closed, but non-null connection.
    
    Maybe it would be good to add documentation near `write-locked` explaining that it is a barrier to anyone else reading from the node+port->socket mapping.  Otherwise, it is not clear how it is helping.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by d2r <gi...@git.apache.org>.
Github user d2r commented on a diff in the pull request:

    https://github.com/apache/storm/pull/349#discussion_r27414030
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
    @@ -335,16 +333,14 @@
             drainer (TransferDrainer.)
             node+port->socket (:cached-node+port->socket worker)
             task->node+port (:cached-task->node+port worker)
    -        endpoint-socket-lock (:endpoint-socket-lock worker)
             ]
         (disruptor/clojure-handler
           (fn [packets _ batch-end?]
             (.add drainer packets)
             
             (when batch-end?
    -          (read-locked endpoint-socket-lock
    -            (let [node+port->socket @node+port->socket]
    -              (.send drainer node+port->socket)))
    +          (let [node+port->socket @node+port->socket]
    +            (.send drainer node+port->socket))
    --- End diff --
    
    We do need this. See [comment](https://github.com/apache/storm/pull/349/files#r27413373)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by d2r <gi...@git.apache.org>.
Github user d2r commented on a diff in the pull request:

    https://github.com/apache/storm/pull/349#discussion_r27404105
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
    @@ -302,9 +301,8 @@
                                port)
                               ]
                              )))
    -              (write-locked (:endpoint-socket-lock worker)
    -                (reset! (:cached-task->node+port worker)
    -                        (HashMap. my-assignment)))
    --- End diff --
    
    The only other place `:cached-task->node+port` is used, it is only a read in `mk-transfer-fn`.  Since it is already an atom, we probably do not need this lock.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by tedxia <gi...@git.apache.org>.
Github user tedxia closed the pull request at:

    https://github.com/apache/storm/pull/349


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by d2r <gi...@git.apache.org>.
Github user d2r commented on the pull request:

    https://github.com/apache/storm/pull/349#issuecomment-87791955
  
    [STORM-737](https://issues.apache.org/jira/browse/STORM-737)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by nathanmarz <gi...@git.apache.org>.
Github user nathanmarz commented on the pull request:

    https://github.com/apache/storm/pull/349#issuecomment-87778672
  
    Looking inside task->node+port needs to be done inside the read lock (as it was done before this TaskDrainer class got inserted here). If that's done, there's no possible way for the sender to get its hands on a connection that's closed or going to be closed while it's using it.
    
    To elaborate on that last point: once the write lock code is finished, all entries in task->node+port are up to date and accurate, and no connections for those node+ports will be closed. This means it's impossible for the code in read-locked to get a to-be-closed connection since it gets everything by looking at task->node+port first.
    
    The write-locked code was carefully written this way to minimize the amount of work that actually happens in the write lock. This is important because when that code is active, tasks are prevented from sending messages. The way it is now, it's just setting the value of a single in-memory atom – so basically no work at all. If it were making/closing connections, it would be a completely different story.
    
    In short, the code in the write-lock is fine – it's the code in the read lock that needs to be fixed. As part of that, the code looking up the node+port for a task needs to be moved back to this function and not happen before the tuple goes on the transfer queue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-593: remove endpoint-socket-lock for wor...

Posted by nathanmarz <gi...@git.apache.org>.
Github user nathanmarz commented on the pull request:

    https://github.com/apache/storm/pull/349#issuecomment-66975846
  
    -1. That r/w lock ensures that send is never called on a closed connection. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---