You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/16 13:26:28 UTC

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7441: ARROW-3446: [R] Document mapping of Arrow <-> R types

fsaintjacques commented on a change in pull request #7441:
URL: https://github.com/apache/arrow/pull/7441#discussion_r440846823



##########
File path: r/vignettes/arrow.Rmd
##########
@@ -86,7 +88,73 @@ to other applications and services that use Arrow. One example is Spark: the
 move data to and from Spark, yielding [significant performance
 gains](http://arrow.apache.org/blog/2019/01/25/r-spark-improvements/).
 
-# Class structure and package conventions
+# Internals
+
+## Mapping of R <--> Arrow types
+
+Arrow has a rich data type system that includes direct parallels with R's data types and much more.
+
+In the tables, entries with a `-` are not currently implemented.
+
+### R to Arrow
+
+| R type                   | Arrow type |
+|--------------------------|------------|
+| logical                  | boolean    |
+| integer                  | int32      |
+| double ("numeric")       | float64    |
+| character                | utf8       |
+| factor                   | dictionary |
+| raw                      | uint8      |
+| Date                     | date32     |
+| POSIXct                  | timestamp  |
+| POSIXlt                  | -          |
+| data.frame               | struct     |
+| list^+^                  | list       |
+| bit64::integer64         | int64      |
+| difftime                 | time32     |
+| vctrs::vctrs_unspecified | null       |
+
+^+^: Only lists where all elements are the same type are able to be translated to Arrow list type (which is a "list of" some type).
+
+### Arrow to R
+
+| Arrow type        | R type                   |
+|-------------------|--------------------------|
+| boolean           | logical                  |
+| int8              | integer                  |
+| int16             | integer                  |
+| int32             | integer                  |
+| int64             | bit64::integer64         |
+| uint8             | integer                  |
+| uint16            | integer                  |
+| uint32            | double                   |
+| uint64            | -                        |
+| float16           | -                   |

Review comment:
       Align.

##########
File path: r/vignettes/arrow.Rmd
##########
@@ -86,7 +88,73 @@ to other applications and services that use Arrow. One example is Spark: the
 move data to and from Spark, yielding [significant performance
 gains](http://arrow.apache.org/blog/2019/01/25/r-spark-improvements/).
 
-# Class structure and package conventions
+# Internals
+
+## Mapping of R <--> Arrow types
+
+Arrow has a rich data type system that includes direct parallels with R's data types and much more.
+
+In the tables, entries with a `-` are not currently implemented.
+
+### R to Arrow
+
+| R type                   | Arrow type |
+|--------------------------|------------|
+| logical                  | boolean    |
+| integer                  | int32      |
+| double ("numeric")       | float64    |
+| character                | utf8       |
+| factor                   | dictionary |
+| raw                      | uint8      |
+| Date                     | date32     |
+| POSIXct                  | timestamp  |
+| POSIXlt                  | -          |
+| data.frame               | struct     |
+| list^+^                  | list       |
+| bit64::integer64         | int64      |
+| difftime                 | time32     |
+| vctrs::vctrs_unspecified | null       |
+
+^+^: Only lists where all elements are the same type are able to be translated to Arrow list type (which is a "list of" some type).
+
+### Arrow to R
+
+| Arrow type        | R type                   |
+|-------------------|--------------------------|
+| boolean           | logical                  |
+| int8              | integer                  |
+| int16             | integer                  |
+| int32             | integer                  |
+| int64             | bit64::integer64         |
+| uint8             | integer                  |
+| uint16            | integer                  |
+| uint32            | double                   |

Review comment:
       I'm curious about the double, is it a copy-paste error, or it really casts to double.

##########
File path: r/vignettes/arrow.Rmd
##########
@@ -86,7 +88,73 @@ to other applications and services that use Arrow. One example is Spark: the
 move data to and from Spark, yielding [significant performance
 gains](http://arrow.apache.org/blog/2019/01/25/r-spark-improvements/).
 
-# Class structure and package conventions
+# Internals
+
+## Mapping of R <--> Arrow types
+
+Arrow has a rich data type system that includes direct parallels with R's data types and much more.
+
+In the tables, entries with a `-` are not currently implemented.
+
+### R to Arrow
+
+| R type                   | Arrow type |
+|--------------------------|------------|
+| logical                  | boolean    |
+| integer                  | int32      |
+| double ("numeric")       | float64    |
+| character                | utf8       |
+| factor                   | dictionary |
+| raw                      | uint8      |
+| Date                     | date32     |
+| POSIXct                  | timestamp  |
+| POSIXlt                  | -          |
+| data.frame               | struct     |
+| list^+^                  | list       |
+| bit64::integer64         | int64      |
+| difftime                 | time32     |
+| vctrs::vctrs_unspecified | null       |
+
+^+^: Only lists where all elements are the same type are able to be translated to Arrow list type (which is a "list of" some type).
+
+### Arrow to R
+
+| Arrow type        | R type                   |
+|-------------------|--------------------------|
+| boolean           | logical                  |
+| int8              | integer                  |
+| int16             | integer                  |
+| int32             | integer                  |
+| int64             | bit64::integer64         |
+| uint8             | integer                  |
+| uint16            | integer                  |
+| uint32            | double                   |
+| uint64            | -                        |
+| float16           | -                   |
+| float32           | double                   |
+| float64           | double                   |
+| utf8              | character                |
+| binary            | -                        |
+| fixed_size_binary | -                        |
+| date32            | Date                     |
+| date64            | POSIXct                  |
+| time32            | hms::difftime            |
+| time64            | hms::difftime            |
+| timestamp         | POSIXct                  |
+| duration          | -                        |
+| decimal           | double                   |
+| dictionary        | factor                   |

Review comment:
       The factor is only for primitive type?

##########
File path: r/src/array_to_vector.cpp
##########
@@ -418,6 +418,7 @@ class Converter_Struct : public Converter {
   std::vector<std::shared_ptr<Converter>> converters;
 };
 
+// Shouldn't this cast before dividing? Otherwise we're doing integer division

Review comment:
       Depends what you want, integer division is much faster than floating division.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org