You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2023/01/12 19:38:44 UTC
[GitHub] [arrow] rtpsw opened a new issue, #33640: [C++] as-of-join backpressure for large sources
rtpsw opened a new issue, #33640:
URL: https://github.com/apache/arrow/issues/33640
### Describe the enhancement requested
Add backpressure support to as-of-join in order to handle large sources. This is a merge into master of (at least most of) https://github.com/westonpace/arrow/pull/21/files
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on issue #33640: [C++] as-of-join backpressure for large sources
Posted by GitBox <gi...@apache.org>.
westonpace commented on issue #33640:
URL: https://github.com/apache/arrow/issues/33640#issuecomment-1381330671
Turns out we hadn't added https://github.com/westonpace/arrow/commit/45791de8311b0c2e2525e72f4c4746cc3b4364e3 anyways. So I've combined both "AsofJoin backpressure" and "backpressure for large sources" into a single PR.
As for the difference:
The first PR (asofjoin backpressure) added backpressure to the asof join node. However, when run on large sources, it tended to crash. The main problem was the "can't shutdown plan from the process thread" problem, which has already been solved on master. So that is why it seemed that "backpressure for large sources" wasn't adding much.
However, "backpressure for large sources" added unit tests to reproduce the issue, and these are valuable, so I've included them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rtpsw commented on issue #33640: [C++] as-of-join backpressure for large sources
Posted by GitBox <gi...@apache.org>.
rtpsw commented on issue #33640:
URL: https://github.com/apache/arrow/issues/33640#issuecomment-1380934066
cc @westonpace @icexelloss
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rtpsw commented on issue #33640: [C++] as-of-join backpressure for large sources
Posted by GitBox <gi...@apache.org>.
rtpsw commented on issue #33640:
URL: https://github.com/apache/arrow/issues/33640#issuecomment-1381026420
In Acero it's the sending nodes that push data rather than the receiving nodes that pull it, so a large source can blow up the as-of-join queues (and other data structures) by pushing faster than some other source, whose lagging prevents the as-of-join node from clearing entries from memory. With this PR, as-of-join would use backpressure to hold up the faster source in order to keep memory usage within bounds.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] icexelloss commented on issue #33640: [C++] as-of-join backpressure for large sources
Posted by GitBox <gi...@apache.org>.
icexelloss commented on issue #33640:
URL: https://github.com/apache/arrow/issues/33640#issuecomment-1380971248
@rtpsw Do you remember what the patch is supposed to do? What about "large sources" doesn't work before the PR and how does the PR fixes that? It has been a while and I don't quite remember and the original PR doesn't seem to have too much context either.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] icexelloss commented on issue #33640: [C++] as-of-join backpressure for large sources
Posted by GitBox <gi...@apache.org>.
icexelloss commented on issue #33640:
URL: https://github.com/apache/arrow/issues/33640#issuecomment-1381992361
Thanks for the explanation. +1 for combing these two PR together then.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] icexelloss commented on issue #33640: [C++] as-of-join backpressure for large sources
Posted by GitBox <gi...@apache.org>.
icexelloss commented on issue #33640:
URL: https://github.com/apache/arrow/issues/33640#issuecomment-1381030693
What you are describing seems like the backpresure feature implemented in this PR:
https://github.com/westonpace/arrow/commit/45791de8311b0c2e2525e72f4c4746cc3b4364e3
What does this PR do/improve that is not in the first "AsofJoin backpresure" PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace closed issue #33640: [C++] as-of-join backpressure for large sources
Posted by GitBox <gi...@apache.org>.
westonpace closed issue #33640: [C++] as-of-join backpressure for large sources
URL: https://github.com/apache/arrow/issues/33640
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org