You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Kirill Müller (Jira)" <ji...@apache.org> on 2022/09/29 03:43:00 UTC

[jira] [Created] (ARROW-17886) [R] Convert schema to the corresponding ptype (zero-row data frame)?

Kirill Müller created ARROW-17886:
-------------------------------------

             Summary: [R] Convert schema to the corresponding ptype (zero-row data frame)?
                 Key: ARROW-17886
                 URL: https://issues.apache.org/jira/browse/ARROW-17886
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Kirill Müller


When fetching data e.g. from a RecordBatchReader, I would like to know, ahead of time, what the data will look like after it's converted to a data frame. I have found a way using utils::head(0), but I'm not sure if it's efficient in all scenarios.

My use case is the Arrow extension to DBI, in particular the default implementation for drivers that don't speak Arrow yet. I'd like to know which types the columns should have on the database. I can already infer this from the corresponding R types, but those existing drivers don't know about Arrow types.

Should we support as.data.frame() for schema objects? The semantics would be to return a zero-row data frame with correct column names and types.


library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp

data <- data.frame(
  a = 1:3,
  b = 2.5,
  c = "three",
  stringsAsFactors = FALSE
)
data$d <- blob::blob(as.raw(1:10))

tbl <- arrow::as_arrow_table(data)
rbr <- arrow::as_record_batch_reader(tbl)

tibble::as_tibble(head(rbr, 0))
#> # A tibble: 0 × 4
#> # … with 4 variables: a <int>, b <dbl>, c <chr>, d <blob>
rbr$read_table()
#> Table
#> 3 rows x 4 columns
#> $a <int32>
#> $b <double>
#> $c <string>
#> $d <<blob[0]>>
#> 
#> See $metadata for additional Schema metadata



--
This message was sent by Atlassian Jira
(v8.20.10#820010)