You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/15 13:07:30 UTC

[GitHub] [arrow] romainfrancois opened a new pull request #7435: ARROW-8779: [R] Implement conversion to List

romainfrancois opened a new pull request #7435:
URL: https://github.com/apache/arrow/pull/7435


   ``` r
   library(arrow)
   #> 
   #> Attaching package: 'arrow'
   #> The following object is masked from 'package:utils':
   #> 
   #>     timestamp
   Array$create(list(data.frame(a = 1:3, b = 11:13)))
   #> ListArray
   #> <list<item: struct<a: int32, b: int32>>>
   #> [
   #>   -- is_valid: all not null
   #>   -- child 0 type: int32
   #>     [
   #>       1,
   #>       2,
   #>       3
   #>     ]
   #>   -- child 1 type: int32
   #>     [
   #>       11,
   #>       12,
   #>       13
   #>     ]
   #> ]
   ```
   
   <sup>Created on 2020-06-15 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
   
   However does not yet deal with the original issue: 
   
   ``` r
   library(arrow)
   #> 
   #> Attaching package: 'arrow'
   #> The following object is masked from 'package:utils':
   #> 
   #>     timestamp
   
   nrows <- 1:3
   df <- tibble::tibble(
       id = 1L,
       data = 
         list(
           tibble::tibble(
             a = letters[nrows],
             b = as.integer(nrows),
             c = as.factor(a),
             d = b/2
           )
         )
     )
   Table$create(df)
   #> Error in Table__from_dots(dots, schema): NotImplemented: Converting vector to arrow type dictionary<values=string, indices=int8, ordered=0> not implemented
   ```
   
   <sup>Created on 2020-06-15 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#issuecomment-644800961


   also added `VectorToArrayConverter::Visit` for dictionaries to that we can handle things like list(factor()), the implementation is simpler than the original `MakeFactorArray` as it leverages `StringDictionaryBuilder`, but it looks like that does not respect the "ordered" property. 
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   
   o <- ordered(c("b", "a"), levels = c("a", "b", "c"))
   
   # this goes trough MakeFactorArray
   Array$create(o)$type
   #> DictionaryType
   #> dictionary<values=string, indices=int8, ordered>
   
   # this goes through VectorToArrayConverter::Visit >> MakeBuilder() ...
   Array$create(list(o))$type$value_type
   #> DictionaryType
   #> dictionary<values=string, indices=int8>
   ```
   
   <sup>Created on 2020-06-16 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on a change in pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on a change in pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#discussion_r443386121



##########
File path: r/src/array_from_vector.cpp
##########
@@ -201,6 +202,67 @@ struct VectorToArrayConverter {
     return Status::OK();
   }
 
+  template <typename T>
+  arrow::enable_if_t<is_struct_type<T>::value, Status> Visit(const T& type) {
+    using BuilderType = typename TypeTraits<T>::BuilderType;
+    ARROW_RETURN_IF(!Rf_inherits(x, "data.frame"),
+                    Status::RError("Expecting a data frame"));
+
+    auto* struct_builder = checked_cast<BuilderType*>(builder);
+
+    int64_t n = vctrs::short_vec_size(x);

Review comment:
       `vctrs::short_vec_size()` knows how to get the size of things like data frames with 0 columns, data frames with data frame columns ... 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#issuecomment-644308992


   I also updated ARROW-7798 with comments to avoid builder in the conversion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on a change in pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#discussion_r440369138



##########
File path: r/src/array_from_vector.cpp
##########
@@ -201,6 +201,39 @@ struct VectorToArrayConverter {
     return Status::OK();
   }
 
+  template <typename T>
+  arrow::enable_if_t<arrow::is_struct_type<T>::value, Status> Visit(const T& type) {
+    using BuilderType = typename TypeTraits<T>::BuilderType;
+    ARROW_RETURN_IF(!Rf_inherits(x, "data.frame"),
+                    Status::RError("Expecting a data frame"));
+
+    auto* struct_builder = checked_cast<BuilderType*>(builder);
+
+    int64_t n = 0;
+    if (XLENGTH(x) > 0) {
+      // TODO: need a more generic way to get the vec_size(<data.frame>)
+      //       perhaps using vctrs::::short_vec_size
+      n = XLENGTH(VECTOR_ELT(x, 0));
+    }
+    RETURN_NOT_OK(struct_builder->Reserve(n));
+
+    // TODO: double check but seems fine that no value is NULL

Review comment:
       You can remove the TODO, this is correct.

##########
File path: r/tests/testthat/test-Array.R
##########
@@ -445,6 +445,9 @@ test_that("Array$create() handles vector -> list arrays (ARROW-7662)", {
   expect_array_roundtrip(list(character(0)), list_of(utf8()))
   expect_array_roundtrip(list("itsy", c("bitsy", "spider"), c("is")), list_of(utf8()))
   expect_array_roundtrip(list("itsy", character(0), c("bitsy", "spider", NA_character_), c("is")), list_of(utf8()))
+
+  # struct
+  expect_array_roundtrip(list(tibble::tibble(a = integer(0))), list_of(struct(a = int32())))

Review comment:
       Add a bit more columns: float, string, bool. Also add a failing case where columns are not of the same size.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#discussion_r441170772



##########
File path: r/src/array_from_vector.cpp
##########
@@ -201,6 +202,67 @@ struct VectorToArrayConverter {
     return Status::OK();
   }
 
+  template <typename T>
+  arrow::enable_if_t<is_struct_type<T>::value, Status> Visit(const T& type) {
+    using BuilderType = typename TypeTraits<T>::BuilderType;
+    ARROW_RETURN_IF(!Rf_inherits(x, "data.frame"),
+                    Status::RError("Expecting a data frame"));
+
+    auto* struct_builder = checked_cast<BuilderType*>(builder);
+
+    int64_t n = vctrs::short_vec_size(x);

Review comment:
       It seems in R it's possible to have data-frame-like tibble columns that are malformed viewed as Arrow-like (as evidenced by the unit test)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on a change in pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on a change in pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#discussion_r440723407



##########
File path: r/tests/testthat/test-Array.R
##########
@@ -445,6 +445,9 @@ test_that("Array$create() handles vector -> list arrays (ARROW-7662)", {
   expect_array_roundtrip(list(character(0)), list_of(utf8()))
   expect_array_roundtrip(list("itsy", c("bitsy", "spider"), c("is")), list_of(utf8()))
   expect_array_roundtrip(list("itsy", character(0), c("bitsy", "spider", NA_character_), c("is")), list_of(utf8()))
+
+  # struct
+  expect_array_roundtrip(list(tibble::tibble(a = integer(0))), list_of(struct(a = int32())))

Review comment:
       Thanks. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#issuecomment-644690630


   I need to add tests for this: 
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   
   nrows <- 1:3
   df <- tibble::tibble(
     id = 1L,
     data = 
       list(
         tibble::tibble(
           a = letters[nrows],
           b = as.integer(nrows),
           c = as.factor(a),
           d = b/2
         )
       )
   )
   tab <- Table$create(df)
   tab$schema
   #> Schema
   #> id: int32
   #> data: list<item: struct<a: string, b: int32, c: dictionary<values=string, indices=int8>, d: double>>
   as.data.frame(tab)
   #> # A tibble: 1 x 2
   #>      id data            
   #>   <int> <list>          
   #> 1     1 <tibble [3 × 4]>
   as.data.frame(tab)$data
   #> [[1]]
   #> # A tibble: 3 x 4
   #>   a         b c         d
   #>   <chr> <int> <fct> <dbl>
   #> 1 a         1 a       0.5
   #> 2 b         2 b       1  
   #> 3 c         3 c       1.5
   ```
   
   <sup>Created on 2020-06-16 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#issuecomment-644128787


   https://issues.apache.org/jira/browse/ARROW-8779


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#discussion_r443735874



##########
File path: r/src/array_from_vector.cpp
##########
@@ -201,6 +202,67 @@ struct VectorToArrayConverter {
     return Status::OK();
   }
 
+  template <typename T>
+  arrow::enable_if_t<is_struct_type<T>::value, Status> Visit(const T& type) {
+    using BuilderType = typename TypeTraits<T>::BuilderType;
+    ARROW_RETURN_IF(!Rf_inherits(x, "data.frame"),
+                    Status::RError("Expecting a data frame"));
+
+    auto* struct_builder = checked_cast<BuilderType*>(builder);
+
+    int64_t n = vctrs::short_vec_size(x);

Review comment:
       See also https://jira.apache.org/jira/browse/ARROW-9028 and some skipped tests about tables with 0 columns




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
wesm closed pull request #7435:
URL: https://github.com/apache/arrow/pull/7435


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#issuecomment-644147517


   @github-actions autotune everything


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on a change in pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#discussion_r441040428



##########
File path: r/src/array_from_vector.cpp
##########
@@ -201,6 +202,67 @@ struct VectorToArrayConverter {
     return Status::OK();
   }
 
+  template <typename T>
+  arrow::enable_if_t<is_struct_type<T>::value, Status> Visit(const T& type) {
+    using BuilderType = typename TypeTraits<T>::BuilderType;
+    ARROW_RETURN_IF(!Rf_inherits(x, "data.frame"),
+                    Status::RError("Expecting a data frame"));
+
+    auto* struct_builder = checked_cast<BuilderType*>(builder);
+
+    int64_t n = vctrs::short_vec_size(x);

Review comment:
       Why do you need this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7435: ARROW-8779: [R] Implement conversion to List

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7435:
URL: https://github.com/apache/arrow/pull/7435#issuecomment-645037889


   thanks @romainfrancois!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org