You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/24 16:53:05 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue #47: DataFrame.collect() should return async stream rather than a Vec

andygrove opened a new issue #47:
URL: https://github.com/apache/arrow-datafusion/issues/47


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   Currently, any DataFrame implementation must load the entire result set into RAM in the `collect()` method because it has to return a `Vec<RecordBatch>`. 
   
   **Describe the solution you'd like**
   Change the signature to return an async stream of batches.
   
   **Describe alternatives you've considered**
   None
   
   **Additional context**
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] djKooks commented on issue #47: DataFrame.collect() should return async stream rather than a Vec

Posted by GitBox <gi...@apache.org>.
djKooks commented on issue #47:
URL: https://github.com/apache/arrow-datafusion/issues/47#issuecomment-841755336


   @andygrove hope to touch on this if it's not on work yet 🙏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] djKooks edited a comment on issue #47: DataFrame.collect() should return async stream rather than a Vec

Posted by GitBox <gi...@apache.org>.
djKooks edited a comment on issue #47:
URL: https://github.com/apache/arrow-datafusion/issues/47#issuecomment-846565698


   @andygrove currently `collect` is collecting `SendableRecordBatchStream` to Vector inside function before returning.
   
   ```
   let it: SendableRecordBatchStream = plan.execute(0).await?;
   common::collect(it).await
   ```
   
   Are you intend to make this not collecting before return?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] djKooks commented on issue #47: DataFrame.collect() should return async stream rather than a Vec

Posted by GitBox <gi...@apache.org>.
djKooks commented on issue #47:
URL: https://github.com/apache/arrow-datafusion/issues/47#issuecomment-846565698


   @andygrove currently `collect` is collecting `SendableRecordBatchStream` to Vector inside function before returning.
   Are you intend to make this not collecting before return?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove closed issue #47: DataFrame.collect() should return async stream rather than a Vec

Posted by GitBox <gi...@apache.org>.
andygrove closed issue #47:
URL: https://github.com/apache/arrow-datafusion/issues/47


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] djKooks commented on issue #47: DataFrame.collect() should return async stream rather than a Vec

Posted by GitBox <gi...@apache.org>.
djKooks commented on issue #47:
URL: https://github.com/apache/arrow-datafusion/issues/47#issuecomment-841755336


   @andygrove hope to touch on this if it's not on work yet 🙏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org