You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2020/04/28 02:05:27 UTC
[spark] branch branch-3.0 updated: [SPARK-31578][R] Vectorize
schema validation for arrow in types.R
This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new e86e266 [SPARK-31578][R] Vectorize schema validation for arrow in types.R
e86e266 is described below
commit e86e2669173ad1afa2a955f60a3838ac4af22363
Author: Michael Chirico <mi...@grabtaxi.com>
AuthorDate: Tue Apr 28 11:03:51 2020 +0900
[SPARK-31578][R] Vectorize schema validation for arrow in types.R
### What changes were proposed in this pull request?
Repeated `sapply` avoided in internal `checkSchemaInArrow`
### Why are the changes needed?
Current implementation is doubly inefficient:
1. Repeatedly doing the same (95%) `sapply` loop
2. Doing scalar `==` on a vector (`==` should be done over the whole vector for efficiency)
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
By my trusty friend the CI bots
Closes #28372 from MichaelChirico/vectorize-types.
Authored-by: Michael Chirico <mi...@grabtaxi.com>
Signed-off-by: HyukjinKwon <gu...@apache.org>
(cherry picked from commit 410fa913215665db72f871b556569cba3dc9ee0a)
Signed-off-by: HyukjinKwon <gu...@apache.org>
---
R/pkg/R/types.R | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/R/pkg/R/types.R b/R/pkg/R/types.R
index 55f7550..5b41f59 100644
--- a/R/pkg/R/types.R
+++ b/R/pkg/R/types.R
@@ -94,27 +94,22 @@ checkSchemaInArrow <- function(schema) {
}
# Both cases below produce a corrupt value for unknown reason. It needs to be investigated.
- if (any(sapply(schema$fields(), function(x) x$dataType.toString() == "FloatType"))) {
+ field_strings <- sapply(schema$fields(), function(x) x$dataType.toString())
+ if (any(field_strings == "FloatType")) {
stop("Arrow optimization in R does not support float type yet.")
}
- if (any(sapply(schema$fields(), function(x) x$dataType.toString() == "BinaryType"))) {
+ if (any(field_strings == "BinaryType")) {
stop("Arrow optimization in R does not support binary type yet.")
}
- if (any(sapply(schema$fields(),
- function(x) startsWith(x$dataType.toString(),
- "ArrayType")))) {
+ if (any(startsWith(field_strings, "ArrayType"))) {
stop("Arrow optimization in R does not support array type yet.")
}
# Arrow optimization in Spark does not yet support both cases below.
- if (any(sapply(schema$fields(),
- function(x) startsWith(x$dataType.toString(),
- "StructType")))) {
+ if (any(startsWith(field_strings, "StructType"))) {
stop("Arrow optimization in R does not support nested struct type yet.")
}
- if (any(sapply(schema$fields(),
- function(x) startsWith(x$dataType.toString(),
- "MapType")))) {
+ if (any(startsWith(field_strings, "MapType"))) {
stop("Arrow optimization in R does not support map type yet.")
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org