You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Nicola Crane (Jira)" <ji...@apache.org> on 2022/10/27 19:36:00 UTC

[jira] [Commented] (ARROW-17886) [R] Convert schema to the corresponding ptype (zero-row data frame)?

    [ https://issues.apache.org/jira/browse/ARROW-17886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625280#comment-17625280 ] 

Nicola Crane commented on ARROW-17886:
--------------------------------------

Closing this as I believe that this is covered by the changes made in ARROW-12105, but if not, please let me know.

> [R] Convert schema to the corresponding ptype (zero-row data frame)?
> --------------------------------------------------------------------
>
>                 Key: ARROW-17886
>                 URL: https://issues.apache.org/jira/browse/ARROW-17886
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Kirill Müller
>            Priority: Minor
>
> When fetching data e.g. from a RecordBatchReader, I would like to know, ahead of time, what the data will look like after it's converted to a data frame. I have found a way using utils::head(0), but I'm not sure if it's efficient in all scenarios.
> My use case is the Arrow extension to DBI, in particular the default implementation for drivers that don't speak Arrow yet. I'd like to know which types the columns should have on the database. I can already infer this from the corresponding R types, but those existing drivers don't know about Arrow types.
> Should we support as.data.frame() for schema objects? The semantics would be to return a zero-row data frame with correct column names and types.
> library(arrow)
> #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #>     timestamp
> data <- data.frame(
>   a = 1:3,
>   b = 2.5,
>   c = "three",
>   stringsAsFactors = FALSE
> )
> data$d <- blob::blob(as.raw(1:10))
> tbl <- arrow::as_arrow_table(data)
> rbr <- arrow::as_record_batch_reader(tbl)
> tibble::as_tibble(head(rbr, 0))
> #> # A tibble: 0 × 4
> #> # … with 4 variables: a <int>, b <dbl>, c <chr>, d <blob>
> rbr$read_table()
> #> Table
> #> 3 rows x 4 columns
> #> $a <int32>
> #> $b <double>
> #> $c <string>
> #> $d <<blob[0]>>
> #> 
> #> See $metadata for additional Schema metadata



--
This message was sent by Atlassian Jira
(v8.20.10#820010)