You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dilipbiswal <gi...@git.apache.org> on 2018/08/14 18:15:01 UTC
[GitHub] spark pull request #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL sup...
GitHub user dilipbiswal opened a pull request:
https://github.com/apache/spark/pull/22107
[SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in R
## What changes were proposed in this pull request?
[SPARK-21274](https://issues.apache.org/jira/browse/SPARK-21274) added support for EXCEPT ALL and INTERSECT ALL. This PR adds the support in R.
## How was this patch tested?
Added test in test_sparkSQL.R
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dilipbiswal/spark SPARK-25117
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22107.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22107
----
commit 6076700fb99733447c19f5887ecf43d0f422c7d4
Author: Dilip Biswal <db...@...>
Date: 2018-08-14T18:06:47Z
[SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in R
commit 426ffeda2e31ebe60f777ca4c172fa79e5c45f2f
Author: Dilip Biswal <db...@...>
Date: 2018-08-14T18:11:37Z
minor fix
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22107
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22107
**[Test build #94760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94760/testReport)** for PR 22107 at commit [`426ffed`](https://github.com/apache/spark/commit/426ffeda2e31ebe60f777ca4c172fa79e5c45f2f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94777/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22107#discussion_r210156892
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -2482,6 +2482,32 @@ test_that("union(), unionByName(), rbind(), except(), and intersect() on a DataF
unlink(jsonPath2)
})
+test_that("intersectAll() and exceptAll()", {
+ df1 <- createDataFrame(
+ list(list("a", 1),
+ list("a", 1),
+ list("a", 1),
+ list("a", 1),
+ list("b", 3),
+ list("c", 4)),
--- End diff --
nit:
```r
list(list("a", 1), list("a", 1), list("a", 1),
list("a", 1), list("b", 3), list("c", 4)),
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2184/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22107#discussion_r210157193
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -2482,6 +2482,32 @@ test_that("union(), unionByName(), rbind(), except(), and intersect() on a DataF
unlink(jsonPath2)
})
+test_that("intersectAll() and exceptAll()", {
+ df1 <- createDataFrame(
+ list(list("a", 1),
+ list("a", 1),
+ list("a", 1),
+ list("a", 1),
+ list("b", 3),
+ list("c", 4)),
+ schema = c("a", "b"))
+ df2 <- createDataFrame(
+ list(list("a", 1), list("a", 1), list("b", 3)),
+ schema = c("a", "b"))
--- End diff --
nit:
```r
df2 <- createDataFrame(list(list("a", 1), list("a", 1), list("b", 3)), schema = c("a", "b"))
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/22107
merged to master
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22107
**[Test build #94764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94764/testReport)** for PR 22107 at commit [`7e88c9d`](https://github.com/apache/spark/commit/7e88c9dd23f0e889f61b6392980363f2b63c0117).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/22107#discussion_r210488641
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2876,6 +2905,37 @@ setMethod("except",
dataFrame(excepted)
})
+#' exceptAll
+#'
+#' Return a new SparkDataFrame containing rows in this SparkDataFrame
+#' but not in another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{EXCEPT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the except all operation.
+#' @family SparkDataFrame functions
+#' @aliases exceptAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname exceptAll
+#' @name exceptAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' exceptAllDF <- exceptAll(df1, df2)
+#' }
+#' @rdname exceptAll
--- End diff --
this is a bug in `except` there should only be one `@rdname` for each
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2240/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2203/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22107
cc @felixcheung
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/22107#discussion_r210146526
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2848,6 +2848,35 @@ setMethod("intersect",
dataFrame(intersected)
})
+#' intersectAll
+#'
+#' Return a new SparkDataFrame containing rows in both this SparkDataFrame
+#' and another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{INTERSECT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the intersect all operation.
+#' @family SparkDataFrame functions
+#' @aliases intersectAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname intersectAll
+#' @name intersectAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' intersectAllDF <- intersectAll(df1, df2)
+#' }
+#' @rdname intersectAll
+#' @note intersectAll since 2.4
--- End diff --
@felixcheungu Ok.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22107
**[Test build #94791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94791/testReport)** for PR 22107 at commit [`1d93304`](https://github.com/apache/spark/commit/1d93304290909617c1ddb794f3599907d09cad3d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/22107#discussion_r210490166
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2876,6 +2905,37 @@ setMethod("except",
dataFrame(excepted)
})
+#' exceptAll
+#'
+#' Return a new SparkDataFrame containing rows in this SparkDataFrame
+#' but not in another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{EXCEPT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the except all operation.
+#' @family SparkDataFrame functions
+#' @aliases exceptAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname exceptAll
+#' @name exceptAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' exceptAllDF <- exceptAll(df1, df2)
+#' }
+#' @rdname exceptAll
+#' @note exceptAll since 2.4.0
+setMethod("exceptAll",
+ signature(x = "SparkDataFrame", y = "SparkDataFrame"),
+ function(x, y) {
+ excepted <- callJMethod(x@sdf, "exceptAll", y@sdf)
+ dataFrame(excepted)
+ })
+
--- End diff --
@felixcheung Sure.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22107
**[Test build #94789 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94789/testReport)** for PR 22107 at commit [`1d93304`](https://github.com/apache/spark/commit/1d93304290909617c1ddb794f3599907d09cad3d).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22107
**[Test build #94777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94777/testReport)** for PR 22107 at commit [`5247ab5`](https://github.com/apache/spark/commit/5247ab5ef79c7d28db5298aea45b7dcad5ec8ab8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22107
**[Test build #94791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94791/testReport)** for PR 22107 at commit [`1d93304`](https://github.com/apache/spark/commit/1d93304290909617c1ddb794f3599907d09cad3d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/22107#discussion_r210488842
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2848,6 +2848,35 @@ setMethod("intersect",
dataFrame(intersected)
})
+#' intersectAll
+#'
+#' Return a new SparkDataFrame containing rows in both this SparkDataFrame
+#' and another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{INTERSECT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the intersect all operation.
+#' @family SparkDataFrame functions
+#' @aliases intersectAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname intersectAll
+#' @name intersectAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' intersectAllDF <- intersectAll(df1, df2)
+#' }
+#' @rdname intersectAll
+#' @note intersectAll since 2.4.0
+setMethod("intersectAll",
+ signature(x = "SparkDataFrame", y = "SparkDataFrame"),
+ function(x, y) {
+ intersected <- callJMethod(x@sdf, "intersectAll", y@sdf)
+ dataFrame(intersected)
+ })
--- End diff --
add extra empty line after code
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2204/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22107
**[Test build #94789 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94789/testReport)** for PR 22107 at commit [`1d93304`](https://github.com/apache/spark/commit/1d93304290909617c1ddb794f3599907d09cad3d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/22107
@felixcheung I have incorporated the comments.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/22107#discussion_r210488890
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2876,6 +2905,37 @@ setMethod("except",
dataFrame(excepted)
})
+#' exceptAll
+#'
+#' Return a new SparkDataFrame containing rows in this SparkDataFrame
+#' but not in another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{EXCEPT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the except all operation.
+#' @family SparkDataFrame functions
+#' @aliases exceptAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname exceptAll
+#' @name exceptAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' exceptAllDF <- exceptAll(df1, df2)
+#' }
+#' @rdname exceptAll
+#' @note exceptAll since 2.4.0
+setMethod("exceptAll",
+ signature(x = "SparkDataFrame", y = "SparkDataFrame"),
+ function(x, y) {
+ excepted <- callJMethod(x@sdf, "exceptAll", y@sdf)
+ dataFrame(excepted)
+ })
+
--- End diff --
nit: remove one of the two empty lines
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94844/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2188/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94791/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22107
**[Test build #94844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94844/testReport)** for PR 22107 at commit [`528050d`](https://github.com/apache/spark/commit/528050d14f16a5a44b552c44d49aa733584c21d6).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22107
**[Test build #94764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94764/testReport)** for PR 22107 at commit [`7e88c9d`](https://github.com/apache/spark/commit/7e88c9dd23f0e889f61b6392980363f2b63c0117).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22107
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94789/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22107
**[Test build #94844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94844/testReport)** for PR 22107 at commit [`528050d`](https://github.com/apache/spark/commit/528050d14f16a5a44b552c44d49aa733584c21d6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/22107#discussion_r210490074
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2876,6 +2905,37 @@ setMethod("except",
dataFrame(excepted)
})
+#' exceptAll
+#'
+#' Return a new SparkDataFrame containing rows in this SparkDataFrame
+#' but not in another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{EXCEPT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the except all operation.
+#' @family SparkDataFrame functions
+#' @aliases exceptAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname exceptAll
+#' @name exceptAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' exceptAllDF <- exceptAll(df1, df2)
+#' }
+#' @rdname exceptAll
--- End diff --
@felixcheung Thanks .. Did you want the original function `except` fixed at part of this ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22107
**[Test build #94760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94760/testReport)** for PR 22107 at commit [`426ffed`](https://github.com/apache/spark/commit/426ffeda2e31ebe60f777ca4c172fa79e5c45f2f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by felixcheungu <gi...@git.apache.org>.
Github user felixcheungu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22107#discussion_r210145398
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2848,6 +2848,35 @@ setMethod("intersect",
dataFrame(intersected)
})
+#' intersectAll
+#'
+#' Return a new SparkDataFrame containing rows in both this SparkDataFrame
+#' and another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{INTERSECT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the intersect all operation.
+#' @family SparkDataFrame functions
+#' @aliases intersectAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname intersectAll
+#' @name intersectAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' intersectAllDF <- intersectAll(df1, df2)
+#' }
+#' @rdname intersectAll
+#' @note intersectAll since 2.4
--- End diff --
please put `2.4.0`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/22107
Thank you very much @HyukjinKwon @felixcheung
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94764/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2196/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/22107#discussion_r210490145
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2848,6 +2848,35 @@ setMethod("intersect",
dataFrame(intersected)
})
+#' intersectAll
+#'
+#' Return a new SparkDataFrame containing rows in both this SparkDataFrame
+#' and another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{INTERSECT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the intersect all operation.
+#' @family SparkDataFrame functions
+#' @aliases intersectAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname intersectAll
+#' @name intersectAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' intersectAllDF <- intersectAll(df1, df2)
+#' }
+#' @rdname intersectAll
+#' @note intersectAll since 2.4.0
+setMethod("intersectAll",
+ signature(x = "SparkDataFrame", y = "SparkDataFrame"),
+ function(x, y) {
+ intersected <- callJMethod(x@sdf, "intersectAll", y@sdf)
+ dataFrame(intersected)
+ })
--- End diff --
@felixcheung OK.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22107
Seems fine.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22107
**[Test build #94777 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94777/testReport)** for PR 22107 at commit [`5247ab5`](https://github.com/apache/spark/commit/5247ab5ef79c7d28db5298aea45b7dcad5ec8ab8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2205/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/22107#discussion_r210488754
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2848,6 +2848,35 @@ setMethod("intersect",
dataFrame(intersected)
})
+#' intersectAll
+#'
+#' Return a new SparkDataFrame containing rows in both this SparkDataFrame
+#' and another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{INTERSECT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the intersect all operation.
+#' @family SparkDataFrame functions
+#' @aliases intersectAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname intersectAll
+#' @name intersectAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' intersectAllDF <- intersectAll(df1, df2)
+#' }
+#' @rdname intersectAll
--- End diff --
ditto here
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22107#discussion_r210157418
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -2482,6 +2482,32 @@ test_that("union(), unionByName(), rbind(), except(), and intersect() on a DataF
unlink(jsonPath2)
})
+test_that("intersectAll() and exceptAll()", {
+ df1 <- createDataFrame(
+ list(list("a", 1),
+ list("a", 1),
+ list("a", 1),
+ list("a", 1),
+ list("b", 3),
+ list("c", 4)),
+ schema = c("a", "b"))
+ df2 <- createDataFrame(
+ list(list("a", 1), list("a", 1), list("b", 3)),
+ schema = c("a", "b"))
+ intersect_all_expected <- data.frame("a" = c("a", "a", "b"), "b" = c(1, 1, 3),
+ stringsAsFactors = FALSE)
+ except_all_expected <- data.frame("a" = c("a", "a", "c"), "b" = c(1, 1, 4),
+ stringsAsFactors = FALSE)
+ intersect_all_df <- arrange(intersectAll(df1, df2), df1$a)
--- End diff --
Strictly, the naming rule is `intersectAllDf` or `intersect.all.df` (see https://github.com/apache/spark/pull/17590#issuecomment-293732796)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117] Add EXEPT ALL and INTERSECT ALL support in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94760/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org