You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2022/07/11 13:33:00 UTC

[jira] [Created] (ARROW-17038) [R] to_arrow() on db connection should hold reference to con

Neal Richardson created ARROW-17038:
---------------------------------------

             Summary: [R] to_arrow() on db connection should hold reference to con
                 Key: ARROW-17038
                 URL: https://issues.apache.org/jira/browse/ARROW-17038
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Neal Richardson


Currently to_arrow() on a duckdb connection returns a RecordBatchReader. This works fine until you want to query again because RecordBatchReader is one-shot: once you've consumed it, you can't do it again. Among the places where this gets in the way is with the dplyr::glimpse() method (ARROW-16776), which shows a preview of the data. But you can't preview a RBR's data without consuming part of it. 

Going the other direction, duckdb solves this by holding a reference to the Dataset/query object, and on demand it does Scanner$create() on it, which it can do multiple times. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)