You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/05/24 18:07:50 UTC
[GitHub] spark pull request #21425: [SPARK-24381][TESTING] Add unit tests for NOT IN ...
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21425#discussion_r190680216
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala ---
@@ -275,6 +277,135 @@ class SubquerySuite extends QueryTest with SharedSQLContext {
}
+ // ``col NOT IN expr'' is quite difficult to reason about. There are many edge cases, some of the
+ // rules are not very intuitive, and precedence and treatment of null values is somewhat
+ // unintuitive. To make this simpler to understand, I've come up with a plain English way of
+ // describing the expected behavior of this query.
+ //
+ // - If the subquery is empty (i.e. returns no rows), the row should be returned, regardless of
+ // whether the filtered columns include nulls.
+ // - If the subquery contains a result with all nulls, then the row should not be returned.
+ // - If for all non-null filter columns there exists a row in the subquery in which each column
+ // either
+ // 1. is equal to the corresponding filter column or
+ // 2. is null
+ // then the row should not be returned. (This includes the case where all filter columns are
+ // null.)
+ // - Otherwise, the row should be returned.
+ //
+ // Using these rules, we can come up with a set of test cases for single-column and multi-column
+ // NOT IN test cases.
+ test("NOT IN single column with nulls predicate subquery") {
+ // Test cases for single-column ``WHERE a NOT IN (SELECT c FROM r ...)'':
+ // | # | does subquery include null? | is a null? | a = c? | row with a included in result? |
+ // | 1 | empty | | | yes |
+ // | 2 | yes | | | no |
+ // | 3 | no | yes | | no |
+ // | 4 | no | no | yes | no |
+ // | 5 | no | no | no | yes |
+ Seq(row((null, 5.0)), row((3, 3.0))).toDF("a", "b").createOrReplaceTempView("m")
--- End diff --
Please use `withTempView` to ensure these views are dropped.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org