You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2022/07/11 13:33:00 UTC
[jira] [Created] (ARROW-17038) [R] to_arrow() on db connection should hold reference to con
Neal Richardson created ARROW-17038:
---------------------------------------
Summary: [R] to_arrow() on db connection should hold reference to con
Key: ARROW-17038
URL: https://issues.apache.org/jira/browse/ARROW-17038
Project: Apache Arrow
Issue Type: Improvement
Components: R
Reporter: Neal Richardson
Currently to_arrow() on a duckdb connection returns a RecordBatchReader. This works fine until you want to query again because RecordBatchReader is one-shot: once you've consumed it, you can't do it again. Among the places where this gets in the way is with the dplyr::glimpse() method (ARROW-16776), which shows a preview of the data. But you can't preview a RBR's data without consuming part of it.
Going the other direction, duckdb solves this by holding a reference to the Dataset/query object, and on demand it does Scanner$create() on it, which it can do multiple times.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)