You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Neal Richardson (JIRA)" <ji...@apache.org> on 2019/06/24 23:49:00 UTC

[jira] [Created] (ARROW-5718) [R] Add as_record_batch()

Neal Richardson created ARROW-5718:
--------------------------------------

             Summary: [R] Add as_record_batch()
                 Key: ARROW-5718
                 URL: https://issues.apache.org/jira/browse/ARROW-5718
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Neal Richardson
             Fix For: 0.14.0


ARROW-3814 / [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94] changed the API of `record_batch()` and `arrow::table()` such that you could no longer pass in a data.frame to the function, not without [massaging it yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27]. That broke sparklyr integration tests with an opaque `cannot infer type from data` error, and it's unfortunate that there's no longer a direct way to go from a data.frame to a record batch, which sounds like a common need.

After some discussion, we resolved that a solution would be to (1) add an {{as_record_batch}} function, which the data.frame method is probably just {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and (2) if a user supplies a single, unnamed data.frame as the argument to {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We may later decide that we should automatically call as_record_batch(), but in case that is too magical and prevents some legitimate use case, let's hold off for now. It's easier to add magic than remove it.

Once this function exists, sparklyr tests can try to use {{as_record_batch}}, and if that function doesn't exist, fall back to {{record_batch}} (because that means it has an older released version of arrow that doesn't have as_record_batch, so record_batch(df) should work).

cc [~javierluraschi]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)