You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2020/04/28 02:05:27 UTC
[spark] branch branch-3.0 updated: [SPARK-31578][R] Vectorize schema validation for arrow in types.R

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new e86e266  [SPARK-31578][R] Vectorize schema validation for arrow in types.R
e86e266 is described below

commit e86e2669173ad1afa2a955f60a3838ac4af22363
Author: Michael Chirico <mi...@grabtaxi.com>
AuthorDate: Tue Apr 28 11:03:51 2020 +0900

    [SPARK-31578][R] Vectorize schema validation for arrow in types.R
    
    ### What changes were proposed in this pull request?
    
    Repeated `sapply` avoided in internal `checkSchemaInArrow`
    
    ### Why are the changes needed?
    
    Current implementation is doubly inefficient:
    
     1. Repeatedly doing the same (95%) `sapply` loop
     2. Doing scalar `==` on a vector (`==` should be done over the whole vector for efficiency)
    
    ### Does this PR introduce any user-facing change?
    
    No
    
    ### How was this patch tested?
    
    By my trusty friend the CI bots
    
    Closes #28372 from MichaelChirico/vectorize-types.
    
    Authored-by: Michael Chirico <mi...@grabtaxi.com>
    Signed-off-by: HyukjinKwon <gu...@apache.org>
    (cherry picked from commit 410fa913215665db72f871b556569cba3dc9ee0a)
    Signed-off-by: HyukjinKwon <gu...@apache.org>
---
 R/pkg/R/types.R | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/R/pkg/R/types.R b/R/pkg/R/types.R
index 55f7550..5b41f59 100644
--- a/R/pkg/R/types.R
+++ b/R/pkg/R/types.R
@@ -94,27 +94,22 @@ checkSchemaInArrow <- function(schema) {
   }
 
   # Both cases below produce a corrupt value for unknown reason. It needs to be investigated.
-  if (any(sapply(schema$fields(), function(x) x$dataType.toString() == "FloatType"))) {
+  field_strings <- sapply(schema$fields(), function(x) x$dataType.toString())
+  if (any(field_strings == "FloatType")) {
     stop("Arrow optimization in R does not support float type yet.")
   }
-  if (any(sapply(schema$fields(), function(x) x$dataType.toString() == "BinaryType"))) {
+  if (any(field_strings == "BinaryType")) {
     stop("Arrow optimization in R does not support binary type yet.")
   }
-  if (any(sapply(schema$fields(),
-                 function(x) startsWith(x$dataType.toString(),
-                 "ArrayType")))) {
+  if (any(startsWith(field_strings, "ArrayType"))) {
     stop("Arrow optimization in R does not support array type yet.")
   }
 
   # Arrow optimization in Spark does not yet support both cases below.
-  if (any(sapply(schema$fields(),
-                 function(x) startsWith(x$dataType.toString(),
-                 "StructType")))) {
+  if (any(startsWith(field_strings, "StructType"))) {
     stop("Arrow optimization in R does not support nested struct type yet.")
   }
-  if (any(sapply(schema$fields(),
-                 function(x) startsWith(x$dataType.toString(),
-                 "MapType")))) {
+  if (any(startsWith(field_strings, "MapType"))) {
     stop("Arrow optimization in R does not support map type yet.")
   }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org