You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/13 15:12:21 UTC

[GitHub] [arrow] rdettai opened a new pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

rdettai opened a new pull request #8658:
URL: https://github.com/apache/arrow/pull/8658


   > This happens when executing a DataFusion query plan with hash aggregation where the data source is not ready on the first call by the Executor, and the async state machine is passed to a pending state
   > 
   > In the Stream implem of GroupedHashAggregateStream and HashAggregateStream, the state is set to self.finished = true on the first call to poll_next(). If the inner stream is Poll::Pending on the first call, this means that the next call resolves to Poll::Ready(None), thus finishing the stream instead of actually consuming the inner data.
   > 
   > I think that it does not happen with most current sources because they never trigger the Poll::Pending state. Parquet is implemented with a blocking call inside poll_next() (which is also problematic but an other issue), Memory yields directly, and CSV also always yields Poll::Ready
   > 
   > An analysis should be performed on all physical plans to check if the issue occurs in other places.
   
   
   https://issues.apache.org/jira/browse/ARROW-10577


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8658:
URL: https://github.com/apache/arrow/pull/8658#issuecomment-726864283


   https://github.com/apache/arrow/pull/8553 maybe be also related -- but it is a much more invasive change than this PR


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8658:
URL: https://github.com/apache/arrow/pull/8658#issuecomment-726825009


   https://issues.apache.org/jira/browse/ARROW-10577


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8658: ARROW-10577: [Rust][DataFusion] HashAggregator stream finishes unexpectedly after going to Pending state - tests

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8658:
URL: https://github.com/apache/arrow/pull/8658#issuecomment-727561962


   FYI https://github.com/apache/arrow/pull/8553 has been merged


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #8658: ARROW-10577: [Rust][DataFusion] HashAggregator stream finishes unexpectedly after going to Pending state - tests

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #8658:
URL: https://github.com/apache/arrow/pull/8658


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8658:
URL: https://github.com/apache/arrow/pull/8658#issuecomment-726864487


   I will try and review it later today or tomorrow morning (UTC+5) time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rdettai edited a comment on pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

Posted by GitBox <gi...@apache.org>.
rdettai edited a comment on pull request #8658:
URL: https://github.com/apache/arrow/pull/8658#issuecomment-726822182


   I initially considered creating a first class citizen `YieldingExec` (https://gist.github.com/rdettai/c2045be688d457cb346c41e8769ed5d8), but as it will likely only be used in the tests for the aggregator, I preferred making a test scoped executor instead. If you see any context where this `YieldingExec` could be useful, I would be more than happy to commit it 😄 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rdettai commented on pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

Posted by GitBox <gi...@apache.org>.
rdettai commented on pull request #8658:
URL: https://github.com/apache/arrow/pull/8658#issuecomment-726879119


   Well, it is great that I did not start fixing the problem and I first focused on building a test that pinpointed the issue. Long live the TDD! 😄
   
   I'll rebase this as soon as you have merged #8553, and these tests should pass without changing anything else! 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rdettai commented on pull request #8658: ARROW-10577: [Rust][DataFusion] HashAggregator stream finishes unexpectedly after going to Pending state - tests

Posted by GitBox <gi...@apache.org>.
rdettai commented on pull request #8658:
URL: https://github.com/apache/arrow/pull/8658#issuecomment-727635074


   @alamb except for Travis that is still stuck, tests are passing after the rebase! If you are okay with the way I structured the test fixture, I think this can be merged!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rdettai commented on pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

Posted by GitBox <gi...@apache.org>.
rdettai commented on pull request #8658:
URL: https://github.com/apache/arrow/pull/8658#issuecomment-726822182


   I initially considered creating a first class citizen `YieldingExec` (https://gist.github.com/rdettai/c2045be688d457cb346c41e8769ed5d8), but as it will likely only be used in the tests for the aggregator, I preferred making a test scoped executor instead. If you see any context where this YieldingExec could be useful, I would be more than happy to commit it 😄 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org