You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/09/29 14:22:00 UTC

[jira] [Commented] (ARROW-17886) [R] Convert schema to the corresponding ptype (zero-row data frame)?

    [ https://issues.apache.org/jira/browse/ARROW-17886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611061#comment-17611061 ] 

Dewey Dunnington commented on ARROW-17886:
------------------------------------------

It hasn't been implemented yet, but we're probably going to include this at least internally to support additional tidyselect helpers (see ARROW-12778). In the meantime, you may be able to use this workaround:

{code:R}
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.

simulate_data_frame <- function(schema) {
  arrays <- lapply(schema$fields, function(field) concat_arrays(type = field$type))
  vectors <- lapply(
    arrays,
    function(array) tryCatch(
      as.vector(array), 
      error = function(...) vctrs::unspecified()
    )
  )
  
  names(vectors) <- names(schema)
  tibble::new_tibble(vectors, nrow = 0)
}

simulate_data_frame(schema(col1 = int32(), col2 = string()))
#> # A tibble: 0 × 2
#> # … with 2 variables: col1 <int>, col2 <chr>
{code}


> [R] Convert schema to the corresponding ptype (zero-row data frame)?
> --------------------------------------------------------------------
>
>                 Key: ARROW-17886
>                 URL: https://issues.apache.org/jira/browse/ARROW-17886
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Kirill Müller
>            Priority: Minor
>
> When fetching data e.g. from a RecordBatchReader, I would like to know, ahead of time, what the data will look like after it's converted to a data frame. I have found a way using utils::head(0), but I'm not sure if it's efficient in all scenarios.
> My use case is the Arrow extension to DBI, in particular the default implementation for drivers that don't speak Arrow yet. I'd like to know which types the columns should have on the database. I can already infer this from the corresponding R types, but those existing drivers don't know about Arrow types.
> Should we support as.data.frame() for schema objects? The semantics would be to return a zero-row data frame with correct column names and types.
> library(arrow)
> #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #>     timestamp
> data <- data.frame(
>   a = 1:3,
>   b = 2.5,
>   c = "three",
>   stringsAsFactors = FALSE
> )
> data$d <- blob::blob(as.raw(1:10))
> tbl <- arrow::as_arrow_table(data)
> rbr <- arrow::as_record_batch_reader(tbl)
> tibble::as_tibble(head(rbr, 0))
> #> # A tibble: 0 × 4
> #> # … with 4 variables: a <int>, b <dbl>, c <chr>, d <blob>
> rbr$read_table()
> #> Table
> #> 3 rows x 4 columns
> #> $a <int32>
> #> $b <double>
> #> $c <string>
> #> $d <<blob[0]>>
> #> 
> #> See $metadata for additional Schema metadata



--
This message was sent by Atlassian Jira
(v8.20.10#820010)