You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "rtpsw (via GitHub)" <gi...@apache.org> on 2023/06/03 10:31:34 UTC

[GitHub] [arrow] rtpsw commented on a diff in pull request #34392: GH-34391: [C++] Future as-of-join-node hangs on distant times

rtpsw commented on code in PR #34392:
URL: https://github.com/apache/arrow/pull/34392#discussion_r1215407511


##########
cpp/src/arrow/acero/asof_join_node.cc:
##########
@@ -1059,7 +1232,7 @@ class AsofJoinNode : public ExecNode {
         // If RHS is finished, then we know it's up to date
         if (rhs.CurrentEmpty())
           return false;  // RHS isn't finished, but is empty --> not up to date
-        if (lhs_ts >= rhs.GetCurrentTime())

Review Comment:
   I looked into this to some degree. When '>=' is used instead, as-of-join hangs for the `Basic7Forward` test-case, because both right tables have distant times (right at the beginning) and none of them advances.
   
   I tend to think that '>=' here won't work in general. Consider the case where the left table is at time `t` and a given right table has multiple rows with time `t`. In a past-join, we'd want to advance the right table past `t` and overwrite the stored entry up to the last row with time `t`, which would be ready for joining. In a future-join, we'd want to advance the same way but to store all rows except the first as future entries. Now, consider the case where the given right table has a row with a distant time `t+tolerance+1`. In a future-join, we'd want to advance the right table to this row and store it but only make it current when it is no longer distant. So, in a clearer (if not fixing) solution, I suspect the tolerance horizon needs to be accounted for in the condition here as well as in the storing/joining.
   
   This needs further looking into in a separate issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org