You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2023/01/12 19:38:44 UTC

[GitHub] [arrow] rtpsw opened a new issue, #33640: [C++] as-of-join backpressure for large sources

rtpsw opened a new issue, #33640:
URL: https://github.com/apache/arrow/issues/33640

   ### Describe the enhancement requested
   
   Add backpressure support to as-of-join in order to handle large sources. This is a merge into master of (at least most of) https://github.com/westonpace/arrow/pull/21/files
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #33640: [C++] as-of-join backpressure for large sources

Posted by GitBox <gi...@apache.org>.
westonpace commented on issue #33640:
URL: https://github.com/apache/arrow/issues/33640#issuecomment-1381330671

   Turns out we hadn't added https://github.com/westonpace/arrow/commit/45791de8311b0c2e2525e72f4c4746cc3b4364e3 anyways.  So I've combined both "AsofJoin backpressure" and "backpressure for large sources" into a single PR.
   
   As for the difference:
   
   The first PR (asofjoin backpressure) added backpressure to the asof join node.  However, when run on large sources, it tended to crash.  The main problem was the "can't shutdown plan from the process thread" problem, which has already been solved on master.  So that is why it seemed that "backpressure for large sources" wasn't adding much.
   
   However, "backpressure for large sources" added unit tests to reproduce the issue, and these are valuable, so I've included them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on issue #33640: [C++] as-of-join backpressure for large sources

Posted by GitBox <gi...@apache.org>.
rtpsw commented on issue #33640:
URL: https://github.com/apache/arrow/issues/33640#issuecomment-1380934066

   cc @westonpace @icexelloss 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on issue #33640: [C++] as-of-join backpressure for large sources

Posted by GitBox <gi...@apache.org>.
rtpsw commented on issue #33640:
URL: https://github.com/apache/arrow/issues/33640#issuecomment-1381026420

   In Acero it's the sending nodes that push data rather than the receiving nodes that pull it, so a large source can blow up the as-of-join queues (and other data structures) by pushing faster than some other source, whose lagging prevents the as-of-join node from clearing entries from memory. With this PR, as-of-join would use backpressure to hold up the faster source in order to keep memory usage within bounds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on issue #33640: [C++] as-of-join backpressure for large sources

Posted by GitBox <gi...@apache.org>.
icexelloss commented on issue #33640:
URL: https://github.com/apache/arrow/issues/33640#issuecomment-1380971248

   @rtpsw Do you remember what the patch is supposed to do? What about "large sources" doesn't work before the PR and how does the PR fixes that? It has been a while and I don't quite remember and the original PR doesn't seem to have too much context either.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on issue #33640: [C++] as-of-join backpressure for large sources

Posted by GitBox <gi...@apache.org>.
icexelloss commented on issue #33640:
URL: https://github.com/apache/arrow/issues/33640#issuecomment-1381992361

   Thanks for the explanation. +1 for combing these two PR together then. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on issue #33640: [C++] as-of-join backpressure for large sources

Posted by GitBox <gi...@apache.org>.
icexelloss commented on issue #33640:
URL: https://github.com/apache/arrow/issues/33640#issuecomment-1381030693

   What you are describing seems like the backpresure feature implemented in this PR:
   https://github.com/westonpace/arrow/commit/45791de8311b0c2e2525e72f4c4746cc3b4364e3
   
   What does this PR do/improve that is not in the first "AsofJoin backpresure" PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace closed issue #33640: [C++] as-of-join backpressure for large sources

Posted by GitBox <gi...@apache.org>.
westonpace closed issue #33640: [C++] as-of-join backpressure for large sources
URL: https://github.com/apache/arrow/issues/33640


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org