You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/03 15:10:23 UTC

[GitHub] [arrow] romainfrancois opened a new pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

romainfrancois opened a new pull request #10445:
URL: https://github.com/apache/arrow/pull/10445


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on a change in pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on a change in pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#discussion_r655145649



##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       ```r
   > .Internal(inspect(as.vector(Array$create(1:10))))
   @7fc930ddbbc8 13 INTSXP g0c0 [REF(65535)] arrow::Array<int32, NONULL> len=10, Array=<0x7fc92b27ad98>, xp=<0x7fc930ddbc38>)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-856934564


   Cool, so that would just be `if(GetBoolOption("arrow.altrep", true) && array->null_count() == 0)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#discussion_r655614600



##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,166 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+  using Pointer = cpp11::external_pointer<std::shared_ptr<Array>,
+                                          cpp11::default_deleter<std::shared_ptr<Array>>>;

Review comment:
       ```suggestion
     static void DeleteArray(std::shared_ptr<Array>* ptr) {
       delete ptr;
     }
     using Pointer = cpp11::external_pointer<std::shared_ptr<Array>, DeleteArray>;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-856654860


   An interesting side benefit from this is that we don't get altrep R vectors from slices: 
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> See arrow_info() for available features
   
   x <- 1:1e3+ 1L
   v <- Array$create(x)
   x1 <- v$as_vector()  
   .Internal(inspect(x1))
   #> @7face1b4ba18 13 INTSXP g0c0 [REF(65535)] std::shared_ptr<arrow::Array, int32, NONULL> (len=1000, ptr=0x7facdc27e998)
   
   x2 <- v$Slice(500)$as_vector()
   .Internal(inspect(x2))
   #> @7facde2bfd80 13 INTSXP g0c0 [REF(65535)] std::shared_ptr<arrow::Array, int32, NONULL> (len=500, ptr=0x7facdb0a2198)
   ```
   
   <sup>Created on 2021-06-08 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0)</sup>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on a change in pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on a change in pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#discussion_r655150340



##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());
+    return TRUE;
+  }
+
+  static const std::shared_ptr<Array>& Get(SEXP vec) {
+    return *cpp11::external_pointer<std::shared_ptr<Array>>(R_altrep_data1(vec));
+  }
+
+  static R_xlen_t Length(SEXP vec) { return Get(vec)->length(); }
+
+  static const void* Dataptr_or_null(SEXP vec) {
+    return Get(vec)->data()->template GetValues<data_type>(1);
+  }
+
+  static SEXP Duplicate(SEXP vec, Rboolean) {
+    const auto& array = Get(vec);
+    auto size = array->length();
+
+    SEXP copy = PROTECT(Rf_allocVector(sexp_type, array->length()));
+
+    memcpy(DATAPTR(copy), Dataptr_or_null(vec), size * sizeof(data_type));
+
+    UNPROTECT(1);
+    return copy;
+  }
+
+  static void* Dataptr(SEXP vec, Rboolean writeable) {
+    return const_cast<void*>(Dataptr_or_null(vec));
+  }
+
+  // by definition, there are no NA
+  static int No_NA(SEXP vec) { return 1; }
+
+  static void Init(R_altrep_class_t class_t, DllInfo* dll) {
+    // altrep
+    R_set_altrep_Length_method(class_t, ArrayNoNull::Length);
+    R_set_altrep_Inspect_method(class_t, ArrayNoNull::Inspect);
+    R_set_altrep_Duplicate_method(class_t, ArrayNoNull::Duplicate);
+
+    // altvec
+    R_set_altvec_Dataptr_method(class_t, ArrayNoNull::Dataptr);
+    R_set_altvec_Dataptr_or_null_method(class_t, ArrayNoNull::Dataptr_or_null);
+  }
+};
+
+struct DoubleArrayNoNull {
+  static R_altrep_class_t class_t;
+
+  static void Init(DllInfo* dll) {
+    class_t = R_make_altreal_class("array_nonull_dbl_vector", "arrow", dll);
+    ArrayNoNull<REALSXP>::Init(class_t, dll);
+    R_set_altreal_No_NA_method(class_t, ArrayNoNull<REALSXP>::No_NA);
+  }
+
+  static SEXP Make(const std::shared_ptr<Array>& array) {
+    return ArrayNoNull<REALSXP>::Make(class_t, array);
+  }
+};
+
+struct Int32ArrayNoNull {
+  static R_altrep_class_t class_t;
+
+  static void Init(DllInfo* dll) {
+    class_t = R_make_altinteger_class("array_nonull_int_vector", "arrow", dll);
+    ArrayNoNull<INTSXP>::Init(class_t, dll);
+    R_set_altinteger_No_NA_method(class_t, ArrayNoNull<INTSXP>::No_NA);
+  }
+
+  static SEXP Make(const std::shared_ptr<Array>& array) {
+    return ArrayNoNull<INTSXP>::Make(class_t, array);
+  }
+};
+
+R_altrep_class_t Int32ArrayNoNull::class_t;
+R_altrep_class_t DoubleArrayNoNull::class_t;
+
+void Init_Altrep_classes(DllInfo* dll) {
+  DoubleArrayNoNull::Init(dll);
+  Int32ArrayNoNull::Init(dll);
+}
+
+SEXP MakeDoubleArrayNoNull(const std::shared_ptr<Array>& array) {
+  return DoubleArrayNoNull::Make(array);
+}
+
+SEXP MakeInt32ArrayNoNull(const std::shared_ptr<Array>& array) {
+  return Int32ArrayNoNull::Make(array);

Review comment:
       I'll think about where to mention this, but this isn't limited to these altrep variants. any int32 array that happens to have a non null `NA_integer_` will be mismanaged by R with this sentinel approach. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on a change in pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on a change in pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#discussion_r655145649



##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       ```
   > .Internal(inspect(as.vector(Array$create(1:10))))
   @7fc930ddbbc8 13 INTSXP g0c0 [REF(65535)] arrow::Array<int32, NONULL> len=10, Array=<0x7fc92b27ad98>, xp=<0x7fc930ddbc38>)
   ```

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       Alternatively, we could call the `inspect_subtree()` function on the xp, and then get something like this: 
   
   ```
   > .Internal(inspect(as.vector(Array$create(1:10))))
   @7feaa7212e38 13 INTSXP g0c0 [REF(65535)] arrow::Array<int32, NONULL> len=10, Array=<0x7fea9ed38948>, xp=<0x7feaa7212ea8>)
     @7feaa7212ea8 22 EXTPTRSXP g0c0 [REF(4)] 
   ``` 

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       ``` r
   library(arrow, warn.conflicts = FALSE)
   #> See arrow_info() for available features
   .Internal(inspect(as.vector(Array$create(1:10))))
   #> @7f91050adcf0 13 INTSXP g0c0 [REF(65535)] arrow::Array<int32, NONULL> len=10, Array=<0x7f90fd0c8ec8>
   #>   @7f91050add60 22 EXTPTRSXP g0c0 [REF(4)]
   ```
   
   <sup>Created on 2021-06-21 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0.9000)</sup>




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on a change in pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on a change in pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#discussion_r655145649



##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       ```r
   > .Internal(inspect(as.vector(Array$create(1:10))))
   @7fc930ddbbc8 13 INTSXP g0c0 [REF(65535)] arrow::Array<int32, NONULL> len=10, Array=<0x7fc92b27ad98>, xp=<0x7fc930ddbc38>)
   ```

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       Alternatively, we could call the `inspect_subtree()` function on the xp, and then get something like this: 
   
   ```
   > .Internal(inspect(as.vector(Array$create(1:10))))
   @7feaa7212e38 13 INTSXP g0c0 [REF(65535)] arrow::Array<int32, NONULL> len=10, Array=<0x7fea9ed38948>, xp=<0x7feaa7212ea8>)
     @7feaa7212ea8 22 EXTPTRSXP g0c0 [REF(4)] 
   ``` 

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       ```
   > .Internal(inspect(as.vector(Array$create(1:10))))
   @7fc930ddbbc8 13 INTSXP g0c0 [REF(65535)] arrow::Array<int32, NONULL> len=10, Array=<0x7fc92b27ad98>, xp=<0x7fc930ddbc38>)
   ```

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());
+    return TRUE;
+  }
+
+  static const std::shared_ptr<Array>& Get(SEXP vec) {
+    return *cpp11::external_pointer<std::shared_ptr<Array>>(R_altrep_data1(vec));
+  }
+
+  static R_xlen_t Length(SEXP vec) { return Get(vec)->length(); }
+
+  static const void* Dataptr_or_null(SEXP vec) {
+    return Get(vec)->data()->template GetValues<data_type>(1);
+  }
+
+  static SEXP Duplicate(SEXP vec, Rboolean) {
+    const auto& array = Get(vec);
+    auto size = array->length();
+
+    SEXP copy = PROTECT(Rf_allocVector(sexp_type, array->length()));
+
+    memcpy(DATAPTR(copy), Dataptr_or_null(vec), size * sizeof(data_type));
+
+    UNPROTECT(1);
+    return copy;
+  }
+
+  static void* Dataptr(SEXP vec, Rboolean writeable) {
+    return const_cast<void*>(Dataptr_or_null(vec));
+  }
+
+  // by definition, there are no NA
+  static int No_NA(SEXP vec) { return 1; }
+
+  static void Init(R_altrep_class_t class_t, DllInfo* dll) {
+    // altrep
+    R_set_altrep_Length_method(class_t, ArrayNoNull::Length);
+    R_set_altrep_Inspect_method(class_t, ArrayNoNull::Inspect);
+    R_set_altrep_Duplicate_method(class_t, ArrayNoNull::Duplicate);
+
+    // altvec
+    R_set_altvec_Dataptr_method(class_t, ArrayNoNull::Dataptr);
+    R_set_altvec_Dataptr_or_null_method(class_t, ArrayNoNull::Dataptr_or_null);
+  }
+};
+
+struct DoubleArrayNoNull {
+  static R_altrep_class_t class_t;
+
+  static void Init(DllInfo* dll) {
+    class_t = R_make_altreal_class("array_nonull_dbl_vector", "arrow", dll);
+    ArrayNoNull<REALSXP>::Init(class_t, dll);
+    R_set_altreal_No_NA_method(class_t, ArrayNoNull<REALSXP>::No_NA);
+  }
+
+  static SEXP Make(const std::shared_ptr<Array>& array) {
+    return ArrayNoNull<REALSXP>::Make(class_t, array);
+  }
+};
+
+struct Int32ArrayNoNull {
+  static R_altrep_class_t class_t;
+
+  static void Init(DllInfo* dll) {
+    class_t = R_make_altinteger_class("array_nonull_int_vector", "arrow", dll);
+    ArrayNoNull<INTSXP>::Init(class_t, dll);
+    R_set_altinteger_No_NA_method(class_t, ArrayNoNull<INTSXP>::No_NA);
+  }
+
+  static SEXP Make(const std::shared_ptr<Array>& array) {
+    return ArrayNoNull<INTSXP>::Make(class_t, array);
+  }
+};
+
+R_altrep_class_t Int32ArrayNoNull::class_t;
+R_altrep_class_t DoubleArrayNoNull::class_t;
+
+void Init_Altrep_classes(DllInfo* dll) {
+  DoubleArrayNoNull::Init(dll);
+  Int32ArrayNoNull::Init(dll);
+}
+
+SEXP MakeDoubleArrayNoNull(const std::shared_ptr<Array>& array) {
+  return DoubleArrayNoNull::Make(array);
+}
+
+SEXP MakeInt32ArrayNoNull(const std::shared_ptr<Array>& array) {
+  return Int32ArrayNoNull::Make(array);

Review comment:
       I'll think about where to mention this, but this isn't limited to these altrep variants. any int32 array that happens to have a non null `NA_integer_` will be mismanaged by R with this sentinel approach. 

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       ```
   > .Internal(inspect(as.vector(Array$create(1:10))))
   @7fc930ddbbc8 13 INTSXP g0c0 [REF(65535)] arrow::Array<int32, NONULL> len=10, Array=<0x7fc92b27ad98>, xp=<0x7fc930ddbc38>)
   ```

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       Alternatively, we could call the `inspect_subtree()` function on the xp, and then get something like this: 
   
   ```
   > .Internal(inspect(as.vector(Array$create(1:10))))
   @7feaa7212e38 13 INTSXP g0c0 [REF(65535)] arrow::Array<int32, NONULL> len=10, Array=<0x7fea9ed38948>, xp=<0x7feaa7212ea8>)
     @7feaa7212ea8 22 EXTPTRSXP g0c0 [REF(4)] 
   ``` 

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       ``` r
   library(arrow, warn.conflicts = FALSE)
   #> See arrow_info() for available features
   .Internal(inspect(as.vector(Array$create(1:10))))
   #> @7f91050adcf0 13 INTSXP g0c0 [REF(65535)] arrow::Array<int32, NONULL> len=10, Array=<0x7f90fd0c8ec8>
   #>   @7f91050add60 22 EXTPTRSXP g0c0 [REF(4)]
   ```
   
   <sup>Created on 2021-06-21 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0.9000)</sup>




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#discussion_r649483759



##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,180 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+struct array_nonull {
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());
+    return TRUE;
+  }
+
+  static std::shared_ptr<Array>& Get(SEXP vec) {
+    return *cpp11::external_pointer<std::shared_ptr<Array>>(R_altrep_data1(vec));
+  }
+
+  static R_xlen_t Length(SEXP vec) { return Get(vec)->length(); }
+
+  static const void* Dataptr_or_null(SEXP vec) {
+    auto& array = Get(vec);
+
+    int size = array->type_id() == Type::INT32 ? sizeof(int) : sizeof(double);
+    return reinterpret_cast<const void*>(array->data()->buffers[1]->data() +
+                                         size * array->offset());
+  }
+
+  static SEXP Duplicate(SEXP vec, Rboolean) {
+    auto& array = Get(vec);
+    auto size = array->length();
+
+    bool int_array = array->type_id() == Type::INT32;
+
+    SEXP copy = PROTECT(Rf_allocVector(int_array ? INTSXP : REALSXP, array->length()));
+
+    memcpy(DATAPTR(copy), Dataptr_or_null(vec),
+           int_array ? (size * sizeof(int)) : (size * sizeof(double)));
+
+    UNPROTECT(1);
+    return copy;
+  }
+
+  static void* Dataptr(SEXP vec, Rboolean writeable) {
+    return const_cast<void*>(Dataptr_or_null(vec));
+  }
+
+  // by definition, there are no NA
+  static int No_NA(SEXP vec) { return 1; }
+
+  static void Init(R_altrep_class_t class_t, DllInfo* dll) {
+    // altrep
+    R_set_altrep_Length_method(class_t, array_nonull::Length);
+    R_set_altrep_Inspect_method(class_t, array_nonull::Inspect);
+    R_set_altrep_Duplicate_method(class_t, array_nonull::Duplicate);
+
+    // altvec
+    R_set_altvec_Dataptr_method(class_t, array_nonull::Dataptr);
+    R_set_altvec_Dataptr_or_null_method(class_t, array_nonull::Dataptr_or_null);
+  }
+};
+
+struct array_nonull_dbl_vector {
+  static R_altrep_class_t class_t;
+
+  static void Init(DllInfo* dll) {
+    class_t = R_make_altreal_class("array_nonull_dbl_vector", "arrow", dll);
+    array_nonull::Init(class_t, dll);
+    R_set_altreal_No_NA_method(class_t, array_nonull::No_NA);
+  }
+};
+
+struct array_nonull_int_vector {
+  static R_altrep_class_t class_t;
+
+  static void Init(DllInfo* dll) {
+    class_t = R_make_altinteger_class("array_nonull_int_vector", "arrow", dll);
+    array_nonull::Init(class_t, dll);
+    R_set_altinteger_No_NA_method(class_t, array_nonull::No_NA);
+  }
+};
+
+R_altrep_class_t array_nonull_int_vector::class_t;
+R_altrep_class_t array_nonull_dbl_vector::class_t;
+
+void Init_Altrep_classes(DllInfo* dll) {
+  array_nonull_dbl_vector::Init(dll);
+  array_nonull_int_vector::Init(dll);
+}
+
+SEXP Make_array_nonull_dbl_vector(const std::shared_ptr<Array>& array) {
+  return array_nonull::Make(array_nonull_dbl_vector::class_t, array);
+}
+
+SEXP Make_array_nonull_int_vector(const std::shared_ptr<Array>& array) {
+  return array_nonull::Make(array_nonull_int_vector::class_t, array);
+}
+
+}  // namespace r
+}  // namespace arrow
+
+// TODO: when arrow depends on R 3.5 we can eliminate this

Review comment:
       We currently support (and test, via crossbow) R versions back to 3.3. (To be clear, this is unrelated to CRAN requirements, compilers, etc., purely about the R APIs.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-864815278


   @nealrichardson I believe this is ready to go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-856892875


   Interesting, we could make that opt-in. For now it's all happening here so would be easy enough to make conditional: 
   
   ```cpp
   // [[arrow::export]]
   SEXP Array__as_vector(const std::shared_ptr<arrow::Array>& array) {
     auto type = array->type();
   
   #if defined(HAS_ALTREP)
     if (array->null_count() == 0) {
       switch (type->id()) {
         case arrow::Type::DOUBLE:
           return arrow::r::Make_array_nonull_dbl_vector(array);
         case arrow::Type::INT32:
           return arrow::r::Make_array_nonull_int_vector(array);
           // case arrow::Type::INT64:
           //   return arrow::r::Make_array_nonull_int64_vector(array);
   
         default:
           break;
       }
     }
   #endif
   
     return arrow::r::ArrayVector__as_vector(array->length(), type, {array});
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-856853833


   This is super cool. We should get some benchmarking on this. 
   
   Tangentially related, should we have an `options(arrow.altrep)` or something to govern whether to use ALTREP? I know `vroom` has a function argument to enable or disable it (though I don't know why exactly). Might be useful to have in case there are unforeseen issues, or if nothing else it would make it easy for people to test side-by-side the performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#discussion_r648947602



##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,180 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+struct array_nonull {
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());
+    return TRUE;
+  }
+
+  static std::shared_ptr<Array>& Get(SEXP vec) {
+    return *cpp11::external_pointer<std::shared_ptr<Array>>(R_altrep_data1(vec));
+  }
+
+  static R_xlen_t Length(SEXP vec) { return Get(vec)->length(); }
+
+  static const void* Dataptr_or_null(SEXP vec) {
+    auto& array = Get(vec);
+
+    int size = array->type_id() == Type::INT32 ? sizeof(int) : sizeof(double);
+    return reinterpret_cast<const void*>(array->data()->buffers[1]->data() +
+                                         size * array->offset());

Review comment:
       Or `array->data()->GetValues<c_type>(1)` where `using c_type = typename ArrowType::c_type`.
   (`ArrowType` being the class template parameter, e.g. `Int64Type` or `DoubleType`)

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,180 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+struct array_nonull {
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());
+    return TRUE;
+  }
+
+  static std::shared_ptr<Array>& Get(SEXP vec) {
+    return *cpp11::external_pointer<std::shared_ptr<Array>>(R_altrep_data1(vec));
+  }
+
+  static R_xlen_t Length(SEXP vec) { return Get(vec)->length(); }
+
+  static const void* Dataptr_or_null(SEXP vec) {
+    auto& array = Get(vec);
+
+    int size = array->type_id() == Type::INT32 ? sizeof(int) : sizeof(double);

Review comment:
       Is it possible to make this a template class instead of manually switching on the type at runtime?

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,180 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+struct array_nonull {
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());
+    return TRUE;
+  }
+
+  static std::shared_ptr<Array>& Get(SEXP vec) {

Review comment:
       Return a const reference?

##########
File path: r/src/array_to_vector.cpp
##########
@@ -1275,7 +1276,22 @@ cpp11::writable::list to_dataframe_parallel(
 
 // [[arrow::export]]
 SEXP Array__as_vector(const std::shared_ptr<arrow::Array>& array) {
-  return arrow::r::ArrayVector__as_vector(array->length(), array->type(), {array});
+  auto type = array->type();
+
+#if defined(HAS_ALTREP)
+  if (array->null_count() == 0 && arrow::r::GetBoolOption("arrow.use_altrep", true)) {
+    switch (type->id()) {
+      case arrow::Type::DOUBLE:
+        return arrow::r::Make_array_nonull_dbl_vector(array);
+      case arrow::Type::INT32:
+        return arrow::r::Make_array_nonull_int_vector(array);
+      default:

Review comment:
       Is there a plan to support other types here?

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,180 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+struct array_nonull {
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());
+    return TRUE;
+  }
+
+  static std::shared_ptr<Array>& Get(SEXP vec) {
+    return *cpp11::external_pointer<std::shared_ptr<Array>>(R_altrep_data1(vec));
+  }
+
+  static R_xlen_t Length(SEXP vec) { return Get(vec)->length(); }
+
+  static const void* Dataptr_or_null(SEXP vec) {
+    auto& array = Get(vec);
+
+    int size = array->type_id() == Type::INT32 ? sizeof(int) : sizeof(double);
+    return reinterpret_cast<const void*>(array->data()->buffers[1]->data() +
+                                         size * array->offset());
+  }
+
+  static SEXP Duplicate(SEXP vec, Rboolean) {
+    auto& array = Get(vec);
+    auto size = array->length();
+
+    bool int_array = array->type_id() == Type::INT32;
+
+    SEXP copy = PROTECT(Rf_allocVector(int_array ? INTSXP : REALSXP, array->length()));
+
+    memcpy(DATAPTR(copy), Dataptr_or_null(vec),
+           int_array ? (size * sizeof(int)) : (size * sizeof(double)));
+
+    UNPROTECT(1);
+    return copy;
+  }
+
+  static void* Dataptr(SEXP vec, Rboolean writeable) {
+    return const_cast<void*>(Dataptr_or_null(vec));
+  }
+
+  // by definition, there are no NA
+  static int No_NA(SEXP vec) { return 1; }
+
+  static void Init(R_altrep_class_t class_t, DllInfo* dll) {
+    // altrep
+    R_set_altrep_Length_method(class_t, array_nonull::Length);
+    R_set_altrep_Inspect_method(class_t, array_nonull::Inspect);
+    R_set_altrep_Duplicate_method(class_t, array_nonull::Duplicate);
+
+    // altvec
+    R_set_altvec_Dataptr_method(class_t, array_nonull::Dataptr);
+    R_set_altvec_Dataptr_or_null_method(class_t, array_nonull::Dataptr_or_null);
+  }
+};
+
+struct array_nonull_dbl_vector {
+  static R_altrep_class_t class_t;
+
+  static void Init(DllInfo* dll) {
+    class_t = R_make_altreal_class("array_nonull_dbl_vector", "arrow", dll);
+    array_nonull::Init(class_t, dll);
+    R_set_altreal_No_NA_method(class_t, array_nonull::No_NA);
+  }
+};
+
+struct array_nonull_int_vector {
+  static R_altrep_class_t class_t;
+
+  static void Init(DllInfo* dll) {
+    class_t = R_make_altinteger_class("array_nonull_int_vector", "arrow", dll);
+    array_nonull::Init(class_t, dll);
+    R_set_altinteger_No_NA_method(class_t, array_nonull::No_NA);
+  }
+};
+
+R_altrep_class_t array_nonull_int_vector::class_t;
+R_altrep_class_t array_nonull_dbl_vector::class_t;
+
+void Init_Altrep_classes(DllInfo* dll) {
+  array_nonull_dbl_vector::Init(dll);
+  array_nonull_int_vector::Init(dll);
+}
+
+SEXP Make_array_nonull_dbl_vector(const std::shared_ptr<Array>& array) {
+  return array_nonull::Make(array_nonull_dbl_vector::class_t, array);
+}
+
+SEXP Make_array_nonull_int_vector(const std::shared_ptr<Array>& array) {
+  return array_nonull::Make(array_nonull_int_vector::class_t, array);
+}
+
+}  // namespace r
+}  // namespace arrow
+
+// TODO: when arrow depends on R 3.5 we can eliminate this

Review comment:
       What does this mean? Does Arrow still support R versions before 3.5? @nealrichardson 

##########
File path: r/tests/testthat/test-altrep.R
##########
@@ -0,0 +1,52 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+context("altrep")
+
+skip_if(getRversion() <= "3.5.0")
+
+test_that("altrep vectors from int32 arrays with no nulls", {
+  withr::local_options(list(arrow.use_altrep = TRUE))
+  v <- Array$create(1:1000)
+  expect_true(is_altrep_int_nonull(as.vector(v)))
+  expect_true(is_altrep_int_nonull(as.vector(v$Slice(1))))
+
+  withr::local_options(list(arrow.use_altrep = NULL))
+  v <- Array$create(1:1000)

Review comment:
       Can you also test with an empty array?

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,180 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+struct array_nonull {
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());
+    return TRUE;
+  }
+
+  static std::shared_ptr<Array>& Get(SEXP vec) {
+    return *cpp11::external_pointer<std::shared_ptr<Array>>(R_altrep_data1(vec));
+  }
+
+  static R_xlen_t Length(SEXP vec) { return Get(vec)->length(); }
+
+  static const void* Dataptr_or_null(SEXP vec) {
+    auto& array = Get(vec);

Review comment:
       Nit, but should use `const auto&` when mutation is not intended.

##########
File path: r/tests/testthat/test-altrep.R
##########
@@ -0,0 +1,52 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+context("altrep")
+
+skip_if(getRversion() <= "3.5.0")
+
+test_that("altrep vectors from int32 arrays with no nulls", {
+  withr::local_options(list(arrow.use_altrep = TRUE))
+  v <- Array$create(1:1000)
+  expect_true(is_altrep_int_nonull(as.vector(v)))
+  expect_true(is_altrep_int_nonull(as.vector(v$Slice(1))))
+
+  withr::local_options(list(arrow.use_altrep = NULL))
+  v <- Array$create(1:1000)
+  expect_true(is_altrep_int_nonull(as.vector(v)))
+  expect_true(is_altrep_int_nonull(as.vector(v$Slice(1))))
+
+  withr::local_options(list(arrow.use_altrep = FALSE))
+  expect_false(is_altrep_int_nonull(as.vector(v)))
+  expect_false(is_altrep_int_nonull(as.vector(v$Slice(1))))
+})
+
+test_that("altrep vectors from double arrays with no nulls", {

Review comment:
       Can you add tests for when there are nulls?

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,180 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+struct array_nonull {

Review comment:
       Is there a reason for the `lower_case` naming style? Note our C++ style guide asks for `CamelCase` function and class names.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-858109294


   > I've used `arrow.use_altrep` but I can use `arrow.altrep` all the same.
   
   SGTM
   
   > 
   > I've put `GetBoolOption("arrow.altrep", true)` after the `array->null_count() == 0` because it's cheaper to call `null_count()`.
   
   That surprises me, I'd expect that checking an R boolean would be faster than counting nulls in a (potentially long) array.
   
   > 
   > `GetBoolOption("arrow.altrep", true)` means that this is opt-out, I'm fine with this but it might be risky.
   
   I think defaulting to ALTREP on is good (particularly if/when we quantify its benefits) but leaving an escape hatch in case of error is nice.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz edited a comment on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
bkietz edited a comment on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-864213320


   > I'm not sure why we get this: https://github.com/apache/arrow/pull/10445/checks?check_run_id=2858594249#step:8:2641
   
   Looks like linking the arrow binding failed. There's a [cpp11 symbol](https://github.com/r-lib/cpp11/blob/e2c1768a71a25e05629d9d862008b11fc23375e5/inst/include/cpp11/external_pointer.hpp#L15-L19) missing
   https://github.com/apache/arrow/pull/10445/checks?check_run_id=2858594151#step:9:5579
   
   ... which is odd because it's a template, so it ought to be inlined everywhere


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-856555691


   https://issues.apache.org/jira/browse/ARROW-9140


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-857603402


   I've used `arrow.use_altrep` but I can use `arrow.altrep` all the same. 
   
   I've put `GetBoolOption("arrow.altrep", true)` after the `array->null_count() == 0` because it's cheaper to call `null_count()`. 
   
   `GetBoolOption("arrow.altrep", true)` means that this is opt-out, I'm fine with this but it might be risky. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#discussion_r654608083



##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());
+    return TRUE;
+  }
+
+  static const std::shared_ptr<Array>& Get(SEXP vec) {
+    return *cpp11::external_pointer<std::shared_ptr<Array>>(R_altrep_data1(vec));
+  }
+
+  static R_xlen_t Length(SEXP vec) { return Get(vec)->length(); }
+
+  static const void* Dataptr_or_null(SEXP vec) {
+    return Get(vec)->data()->template GetValues<data_type>(1);
+  }
+
+  static SEXP Duplicate(SEXP vec, Rboolean) {
+    const auto& array = Get(vec);
+    auto size = array->length();
+
+    SEXP copy = PROTECT(Rf_allocVector(sexp_type, array->length()));
+
+    memcpy(DATAPTR(copy), Dataptr_or_null(vec), size * sizeof(data_type));
+
+    UNPROTECT(1);
+    return copy;
+  }
+
+  static void* Dataptr(SEXP vec, Rboolean writeable) {
+    return const_cast<void*>(Dataptr_or_null(vec));
+  }
+
+  // by definition, there are no NA
+  static int No_NA(SEXP vec) { return 1; }
+
+  static void Init(R_altrep_class_t class_t, DllInfo* dll) {
+    // altrep
+    R_set_altrep_Length_method(class_t, ArrayNoNull::Length);
+    R_set_altrep_Inspect_method(class_t, ArrayNoNull::Inspect);
+    R_set_altrep_Duplicate_method(class_t, ArrayNoNull::Duplicate);
+
+    // altvec
+    R_set_altvec_Dataptr_method(class_t, ArrayNoNull::Dataptr);
+    R_set_altvec_Dataptr_or_null_method(class_t, ArrayNoNull::Dataptr_or_null);
+  }
+};
+
+struct DoubleArrayNoNull {
+  static R_altrep_class_t class_t;
+
+  static void Init(DllInfo* dll) {
+    class_t = R_make_altreal_class("array_nonull_dbl_vector", "arrow", dll);
+    ArrayNoNull<REALSXP>::Init(class_t, dll);
+    R_set_altreal_No_NA_method(class_t, ArrayNoNull<REALSXP>::No_NA);
+  }
+
+  static SEXP Make(const std::shared_ptr<Array>& array) {
+    return ArrayNoNull<REALSXP>::Make(class_t, array);
+  }
+};
+
+struct Int32ArrayNoNull {
+  static R_altrep_class_t class_t;
+
+  static void Init(DllInfo* dll) {
+    class_t = R_make_altinteger_class("array_nonull_int_vector", "arrow", dll);
+    ArrayNoNull<INTSXP>::Init(class_t, dll);
+    R_set_altinteger_No_NA_method(class_t, ArrayNoNull<INTSXP>::No_NA);
+  }
+
+  static SEXP Make(const std::shared_ptr<Array>& array) {
+    return ArrayNoNull<INTSXP>::Make(class_t, array);
+  }
+};
+
+R_altrep_class_t Int32ArrayNoNull::class_t;
+R_altrep_class_t DoubleArrayNoNull::class_t;
+
+void Init_Altrep_classes(DllInfo* dll) {
+  DoubleArrayNoNull::Init(dll);
+  Int32ArrayNoNull::Init(dll);
+}
+
+SEXP MakeDoubleArrayNoNull(const std::shared_ptr<Array>& array) {
+  return DoubleArrayNoNull::Make(array);
+}
+
+SEXP MakeInt32ArrayNoNull(const std::shared_ptr<Array>& array) {
+  return Int32ArrayNoNull::Make(array);

Review comment:
       Edge case worthy of mention in a comment somewhere: IIUC this doesn't check for instances of `NA_integer_` in `array` and these will be considered null by R even though `is_altrep_int_nonull(as.vector(array))` (and we report `no_NA = 1`)

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       would it be useful to include the external pointer in this?

##########
File path: r/src/array_to_vector.cpp
##########
@@ -144,6 +144,24 @@ Status IngestSome(const std::shared_ptr<arrow::Array>& array, R_xlen_t n,
 // Allocate + Ingest
 SEXP ArrayVector__as_vector(R_xlen_t n, const std::shared_ptr<DataType>& type,
                             const ArrayVector& arrays) {
+  // special case when there is only one array
+#if defined(HAS_ALTREP)

Review comment:
       Nit:
   ```suggestion
   #if defined(HAS_ALTREP)
     // special case when there is only one array
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-864815278


   @nealrichardson I believe this is ready to go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#discussion_r655614600



##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,166 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+  using Pointer = cpp11::external_pointer<std::shared_ptr<Array>,
+                                          cpp11::default_deleter<std::shared_ptr<Array>>>;

Review comment:
       ```suggestion
     static void DeleteArray(std::shared_ptr<Array>* ptr) {
       delete ptr;
     }
     using Pointer = cpp11::external_pointer<std::shared_ptr<Array>, DeleteArray>;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on a change in pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on a change in pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#discussion_r655148264



##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       Alternatively, we could call the `inspect_subtree()` function on the xp, and then get something like this: 
   
   ```
   > .Internal(inspect(as.vector(Array$create(1:10))))
   @7feaa7212e38 13 INTSXP g0c0 [REF(65535)] arrow::Array<int32, NONULL> len=10, Array=<0x7fea9ed38948>, xp=<0x7feaa7212ea8>)
     @7feaa7212ea8 22 EXTPTRSXP g0c0 [REF(4)] 
   ``` 

##########
File path: r/src/altrep.cpp
##########
@@ -0,0 +1,163 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cpp11/altrep.hpp>
+
+#include "./arrow_types.h"
+
+#if defined(HAS_ALTREP)
+
+#include <R_ext/Altrep.h>
+#include <arrow/array.h>
+
+namespace arrow {
+namespace r {
+
+template <int sexp_type>
+struct ArrayNoNull {
+  using data_type = typename std::conditional<sexp_type == INTSXP, int, double>::type;
+
+  // altrep object around an Array with no nulls
+  // data1: an external pointer to a shared pointer to the Array
+  // data2: not used
+
+  static SEXP Make(R_altrep_class_t class_t, const std::shared_ptr<Array>& array) {
+    // we don't need the whole r6 object, just an external pointer
+    // that retain the array
+    cpp11::external_pointer<std::shared_ptr<Array>> xp(new std::shared_ptr<Array>(array));
+
+    SEXP res = R_new_altrep(class_t, xp, R_NilValue);
+    MARK_NOT_MUTABLE(res);
+
+    return res;
+  }
+
+  static Rboolean Inspect(SEXP x, int pre, int deep, int pvec,
+                          void (*inspect_subtree)(SEXP, int, int, int)) {
+    const auto& array = Get(x);
+    Rprintf("std::shared_ptr<arrow::Array, %s, NONULL> (len=%d, ptr=%p)\n",
+            array->type()->ToString().c_str(), array->length(), array.get());

Review comment:
       ```
   > .Internal(inspect(as.vector(Array$create(1:10))))
   @7fc930ddbbc8 13 INTSXP g0c0 [REF(65535)] arrow::Array<int32, NONULL> len=10, Array=<0x7fc92b27ad98>, xp=<0x7fc930ddbc38>)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-856674081


   @nealrichardson I've turned off making INT64 arrays with no nulls into altrep bit64::integer64 vectors (i.e. reusing the payload of the array as the data of the underlying double array in R ...) because of the auto downcasting to int vectors when it fits: 
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   
   i64 <- Array$create(bit64::as.integer64(1:10))
   .Internal(inspect(as.vector(i64)))
   #> @7ff0b2099c38 13 INTSXP g0c4 [REF(2)] (len=10, tl=0) 1,2,3,4,5,...
   ```
   
   <sup>Created on 2021-06-08 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0)</sup>
   
   I can altogether remove the int64 altrep code if needed ... ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
bkietz commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-864213320


   > I'm not sure why we get this: https://github.com/apache/arrow/pull/10445/checks?check_run_id=2858594249#step:8:2641
   
   Looks like linking the arrow binding failed. There's a [cpp11 symbol](https://github.com/r-lib/cpp11/blob/e2c1768a71a25e05629d9d862008b11fc23375e5/inst/include/cpp11/external_pointer.hpp#L15-L19) missing
   https://github.com/apache/arrow/pull/10445/checks?check_run_id=2858594151#step:9:5579


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois edited a comment on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois edited a comment on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-863903044


   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> See arrow_info() for available features
   
   c_int <- ChunkedArray$create(1:1000)
   c_dbl <- ChunkedArray$create(as.numeric(1:1000))
   c_int$num_chunks
   #> [1] 1
   c_dbl$num_chunks
   #> [1] 1
   .Internal(inspect(as.vector(c_int)))
   #> @7fc1314ce2a8 13 INTSXP g0c0 [REF(65535)] std::shared_ptr<arrow::Array, int32, NONULL> (len=1000, ptr=0x7fc1274a0408)
   .Internal(inspect(as.vector(c_dbl)))
   #> @7fc131528a90 14 REALSXP g0c0 [REF(65535)] std::shared_ptr<arrow::Array, double, NONULL> (len=1000, ptr=0x7fc12b2b5178)
   ```
   
   <sup>Created on 2021-06-18 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0.9000)</sup>
   
   I think I'll deal with this https://issues.apache.org/jira/browse/ARROW-13114 here as well, and then do a follow up for using RTasks (i.e. https://issues.apache.org/jira/browse/ARROW-13113). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-863903044






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-858983224


   Ok, I've run some benchmarks on this branch and I'm seeing a huge speed up for floats + integers with `as.vector(array)`. 🎉 
   
   It might be out of scope for this PR, but chunked arrays don't see a similar speed up (which makes sense given they call `ArrayVector__as_vector` directly rather than routing through `Array__as_vector`, so they aren't being using alt rep). I can't quite tell from the cpp if `Table__to_dataframe` would _just work_ with alt rep as well if it worked with ChunkedArrays or if we would need to more to facilitate that.
   
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   
   x <- 1:1e3+ 1L
   v <- Array$create(x)
   x1 <- v$as_vector()  
   .Internal(inspect(x1))
   #> @7f9077f5a1a8 13 INTSXP g0c0 [REF(65535)] std::shared_ptr<arrow::Array, int32, NONULL> (len=1000, ptr=0x7f90975a9a08)
   
   
   v_chunked <- ChunkedArray$create(x)
   x2 <- v_chunked$as_vector()  
   .Internal(inspect(x2))
   #> @7f908312c000 13 INTSXP g0c7 [REF(2)] (len=1000, tl=0) 2,3,4,5,6,...
   ```
   
   <sup>Created on 2021-06-10 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0)</sup>
   
   arrowbench results (using the new benchmarks in https://github.com/ursacomputing/arrowbench/pull/28): 
   [zero-copy-data-conversion.html.zip](https://github.com/apache/arrow/files/6633992/zero-copy-data-conversion.html.zip)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson closed pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

Posted by GitBox <gi...@apache.org>.
nealrichardson closed pull request #10445:
URL: https://github.com/apache/arrow/pull/10445


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org