You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "rtpsw (via GitHub)" <gi...@apache.org> on 2023/06/15 13:02:04 UTC

[GitHub] [arrow] rtpsw opened a new pull request, #36094: GH-36092: [C++] Simplify concurrency in as-of-join node

rtpsw opened a new pull request, #36094:
URL: https://github.com/apache/arrow/pull/36094

   ### What changes are included in this PR?
   
   The key hasher invalidation and memo-store time updating are moved to the processing thread.
   
   ### Are these changes tested?
   
   Yes, by existing tests.
   
   ### Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #36094: GH-36092: [C++] Simplify concurrency in as-of-join node

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #36094:
URL: https://github.com/apache/arrow/pull/36094#issuecomment-1601683223

   @rtpsw Looks like you added another atomic variable here:
   https://github.com/apache/arrow/pull/35874/files#r1218191706
   
   Can you fix that one too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #36094: GH-36092: [C++] Simplify concurrency in as-of-join node

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #36094:
URL: https://github.com/apache/arrow/pull/36094#issuecomment-1602442838

   > This makes sense to me. Let me verify I understand:
   > 
   > Before, on InputReceived, we would update the memo for an input based on the new batch.
   > 
   > Now, the only thing that really happens on InputReceived, is the batch is placed into the appropriate input queue. The update to the memo table happens when teh processing thread runs, as it is processing batches for an input.
   
   As @icexelloss said, this is correct. However, there is another difference to note. The pre-PR code invalidates the key hasher and updates the memo-store time given any batch, whereas the post-PR code does so given all but the first batch (after `batches_processed_` is incremented). This is fine because the key hasher is initialized as invalid and the memo-store time is initialized to the lowest value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #36094: GH-36092: [C++] Simplify concurrency in as-of-join node

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #36094:
URL: https://github.com/apache/arrow/pull/36094#issuecomment-1594184223

   cc @westonpace, @icexelloss


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #36094: GH-36092: [C++] Simplify concurrency in as-of-join node

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #36094:
URL: https://github.com/apache/arrow/pull/36094#issuecomment-1601681249

   > This makes sense to me. Let me verify I understand:
   > 
   > Before, on InputReceived, we would update the memo for an input based on the new batch.
   > 
   > Now, the only thing that really happens on InputReceived, is the batch is placed into the appropriate input queue. The update to the memo table happens when teh processing thread runs, as it is processing batches for an input.
   
   Yes exactly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss merged pull request #36094: GH-36092: [C++] Simplify concurrency in as-of-join node

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss merged PR #36094:
URL: https://github.com/apache/arrow/pull/36094


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #36094: GH-36092: [C++] Simplify concurrency in as-of-join node

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #36094:
URL: https://github.com/apache/arrow/pull/36094#issuecomment-1601684126

   > @rtpsw Looks like you added another atomic variable here: https://github.com/apache/arrow/pull/35874/files#r1218191706
   > 
   > Can you fix that one too?
   
   Nvm looks like you fixed this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #36094: GH-36092: [C++] Simplify concurrency in as-of-join node

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on code in PR #36094:
URL: https://github.com/apache/arrow/pull/36094#discussion_r1238357290


##########
cpp/src/arrow/acero/asof_join_node.cc:
##########
@@ -364,21 +364,16 @@ struct MemoStore {
     std::swap(index_, memo.index_);
 #endif
     std::swap(no_future_, memo.no_future_);
-    current_time_ = memo.current_time_.exchange(static_cast<OnType>(current_time_));
+    std::swap(current_time_, memo.current_time_);
     entries_.swap(memo.entries_);
     future_entries_.swap(memo.future_entries_);
     times_.swap(memo.times_);
   }
 
-  // Updates the current time to `ts` if it is less. A different thread may win the race
-  // to update the current time to more than `ts` but not to less. Returns whether the
-  // current time was changed from its value at the beginning of this invocation.
+  // Updates the current time to `ts` if it is less. Returns true if updated.
   bool UpdateTime(OnType ts) {
-    OnType prev_time = current_time_;
-    bool update = prev_time < ts;
-    while (prev_time < ts && !current_time_.compare_exchange_weak(prev_time, ts)) {
-      // intentionally empty - standard CAS loop
-    }
+    bool update = current_time_ < ts;
+    if (update) current_time_ = ts;

Review Comment:
   The method still needs to return a Boolean, so we can't remove the `bool update` line, but we can replace the if-statement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #36094: GH-36092: [C++] Simplify concurrency in as-of-join node

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #36094:
URL: https://github.com/apache/arrow/pull/36094#issuecomment-1593030265

   cc @westonpace, @icexelloss


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #36094: GH-36092: [C++] Simplify concurrency in as-of-join node

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #36094:
URL: https://github.com/apache/arrow/pull/36094#discussion_r1237684152


##########
cpp/src/arrow/acero/asof_join_node.cc:
##########
@@ -364,21 +364,16 @@ struct MemoStore {
     std::swap(index_, memo.index_);
 #endif
     std::swap(no_future_, memo.no_future_);
-    current_time_ = memo.current_time_.exchange(static_cast<OnType>(current_time_));
+    std::swap(current_time_, memo.current_time_);
     entries_.swap(memo.entries_);
     future_entries_.swap(memo.future_entries_);
     times_.swap(memo.times_);
   }
 
-  // Updates the current time to `ts` if it is less. A different thread may win the race
-  // to update the current time to more than `ts` but not to less. Returns whether the
-  // current time was changed from its value at the beginning of this invocation.
+  // Updates the current time to `ts` if it is less. Returns true if updated.
   bool UpdateTime(OnType ts) {
-    OnType prev_time = current_time_;
-    bool update = prev_time < ts;
-    while (prev_time < ts && !current_time_.compare_exchange_weak(prev_time, ts)) {
-      // intentionally empty - standard CAS loop
-    }
+    bool update = current_time_ < ts;
+    if (update) current_time_ = ts;

Review Comment:
   ```suggestion
       current_time_ = std::min(current_time_, ts);
   ```
   Minor nit: Maybe simpler?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36094: GH-36092: [C++] Simplify concurrency in as-of-join node

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36094:
URL: https://github.com/apache/arrow/pull/36094#issuecomment-1606126779

   Conbench analyzed the 6 benchmark runs on commit `1b29d958`.
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14535245752) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org