You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/12/11 19:28:24 UTC

[GitHub] [incubator-mxnet] eric-haibin-lin commented on a change in pull request #17007: [BUGFIX] Fix race condition in kvstore.pushpull

eric-haibin-lin commented on a change in pull request #17007: [BUGFIX] Fix race condition in kvstore.pushpull
URL: https://github.com/apache/incubator-mxnet/pull/17007#discussion_r356790519
 
 

 ##########
 File path: src/kvstore/kvstore_dist_server.h
 ##########
 @@ -364,21 +364,34 @@ class KVStoreDistServer {
       if (log_verbose_)  {
         LOG(INFO) << "sent response to " << update_buf->request.size() << " workers";
       }
+      /**
+       * Request can be for either push, pull or pushpull
+       * If pull flag is set, respond immediately with the updated values
+       * Otherwise, only send the notification
+       */
+      bool has_pull = false;
       for (const auto& req : update_buf->request) {
-        /**
-         * Request can be for either push, pull or pushpull
-         * If pull flag is set, respond immediately with the updated values
-         * Otherwise, only send the notification
-         */
-        if (req.pull) {
-          DefaultStorageResponse(type, key, req, req_data, server);
-        } else {
+        has_pull = has_pull || req.pull;
+      }
+      if (has_pull) {
+        // if there is a pull request, perform WaitToRead() once before DefaultStorageResponse
+        if (has_multi_precision_copy(type)) CopyFromTo(stored, store_[key]);
+        stored.WaitToRead();
+        for (const auto& req : update_buf->request) {
+          if (req.pull) {
+            DefaultStorageResponse(type, key, req, req_data, server);
+          }
+        }
+        update_buf->request.clear();
+      } else {
+        // otherwise, send response directly
+        for (const auto& req : update_buf->request) {
           server->Response(req);
         }
+        update_buf->request.clear();
+        if (has_multi_precision_copy(type)) CopyFromTo(stored, store_[key]);
 
 Review comment:
   The order is different. In this branch, it is done after `Response` with ACK so that we send back message as early as possible. We then push to engine to perform copy. 
   For the previous case, we need to response with real data instead of ACK, we actually need to perform copy first. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services