You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2017/10/26 11:54:49 UTC
spark git commit: [SPARK-17902][R] Revive stringsAsFactors option for
collect() in SparkR
Repository: spark
Updated Branches:
refs/heads/master 3073344a2 -> a83d8d5ad
[SPARK-17902][R] Revive stringsAsFactors option for collect() in SparkR
## What changes were proposed in this pull request?
This PR proposes to revive `stringsAsFactors` option in collect API, which was mistakenly removed in https://github.com/apache/spark/commit/71a138cd0e0a14e8426f97877e3b52a562bbd02c.
Simply, it casts `charactor` to `factor` if it meets the condition, `stringsAsFactors && is.character(vec)` in primitive type conversion.
## How was this patch tested?
Unit test in `R/pkg/tests/fulltests/test_sparkSQL.R`.
Author: hyukjinkwon <gu...@gmail.com>
Closes #19551 from HyukjinKwon/SPARK-17902.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a83d8d5a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a83d8d5a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a83d8d5a
Branch: refs/heads/master
Commit: a83d8d5adcb4e0061e43105767242ba9770dda96
Parents: 3073344
Author: hyukjinkwon <gu...@gmail.com>
Authored: Thu Oct 26 20:54:36 2017 +0900
Committer: hyukjinkwon <gu...@gmail.com>
Committed: Thu Oct 26 20:54:36 2017 +0900
----------------------------------------------------------------------
R/pkg/R/DataFrame.R | 3 +++
R/pkg/tests/fulltests/test_sparkSQL.R | 6 ++++++
2 files changed, 9 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/a83d8d5a/R/pkg/R/DataFrame.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 176bb3b..aaa3349 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -1191,6 +1191,9 @@ setMethod("collect",
vec <- do.call(c, col)
stopifnot(class(vec) != "list")
class(vec) <- PRIMITIVE_TYPES[[colType]]
+ if (is.character(vec) && stringsAsFactors) {
+ vec <- as.factor(vec)
+ }
df[[colIndex]] <- vec
} else {
df[[colIndex]] <- col
http://git-wip-us.apache.org/repos/asf/spark/blob/a83d8d5a/R/pkg/tests/fulltests/test_sparkSQL.R
----------------------------------------------------------------------
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R b/R/pkg/tests/fulltests/test_sparkSQL.R
index 4382ef2..0c8118a 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -499,6 +499,12 @@ test_that("create DataFrame with different data types", {
expect_equal(collect(df), data.frame(l, stringsAsFactors = FALSE))
})
+test_that("SPARK-17902: collect() with stringsAsFactors enabled", {
+ df <- suppressWarnings(collect(createDataFrame(iris), stringsAsFactors = TRUE))
+ expect_equal(class(iris$Species), class(df$Species))
+ expect_equal(iris$Species, df$Species)
+})
+
test_that("SPARK-17811: can create DataFrame containing NA as date and time", {
df <- data.frame(
id = 1:2,
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org