You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Romain François (JIRA)" <ji...@apache.org> on 2019/06/26 08:24:00 UTC

[jira] [Updated] (ARROW-5718) [R] auto splice data frames in record_batch() and table()

     [ https://issues.apache.org/jira/browse/ARROW-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Romain François updated ARROW-5718:
-----------------------------------
    Summary: [R] auto splice data frames in record_batch() and table()  (was: [R] Add as_record_batch())

> [R] auto splice data frames in record_batch() and table()
> ---------------------------------------------------------
>
>                 Key: ARROW-5718
>                 URL: https://issues.apache.org/jira/browse/ARROW-5718
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Neal Richardson
>            Assignee: Romain François
>            Priority: Minor
>             Fix For: 0.14.0
>
>
> ARROW-3814 / [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94] changed the API of `record_batch()` and `arrow::table()` such that you could no longer pass in a data.frame to the function, not without [massaging it yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27]. That broke sparklyr integration tests with an opaque `cannot infer type from data` error, and it's unfortunate that there's no longer a direct way to go from a data.frame to a record batch, which sounds like a common need.
> In order to follow best practices (cf. the [tibble|https://tibble.tidyverse.org/] package, for example), we should (1) add an {{as_record_batch}} function, which the data.frame method is probably just {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and (2) if a user supplies a single, unnamed data.frame as the argument to {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We may later decide that we should automatically call as_record_batch(), but in case that is too magical and prevents some legitimate use case, let's hold off for now. It's easier to add magic than remove it.
> Once this function exists, sparklyr tests can try to use {{as_record_batch}}, and if that function doesn't exist, fall back to {{record_batch}} (because that means it has an older released version of arrow that doesn't have as_record_batch, so record_batch(df) should work).
> cc [~javierluraschi]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)