You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/05/24 18:07:50 UTC

[GitHub] spark pull request #21425: [SPARK-24381][TESTING] Add unit tests for NOT IN ...

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21425#discussion_r190680216
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala ---
    @@ -275,6 +277,135 @@ class SubquerySuite extends QueryTest with SharedSQLContext {
     
       }
     
    +  // ``col NOT IN expr'' is quite difficult to reason about. There are many edge cases, some of the
    +  // rules are not very intuitive, and precedence and treatment of null values is somewhat
    +  // unintuitive. To make this simpler to understand, I've come up with a plain English way of
    +  // describing the expected behavior of this query.
    +  //
    +  // - If the subquery is empty (i.e. returns no rows), the row should be returned, regardless of
    +  //   whether the filtered columns include nulls.
    +  // - If the subquery contains a result with all nulls, then the row should not be returned.
    +  // - If for all non-null filter columns there exists a row in the subquery in which each column
    +  //   either
    +  //   1. is equal to the corresponding filter column or
    +  //   2. is null
    +  //   then the row should not be returned. (This includes the case where all filter columns are
    +  //   null.)
    +  // - Otherwise, the row should be returned.
    +  //
    +  // Using these rules, we can come up with a set of test cases for single-column and multi-column
    +  // NOT IN test cases.
    +  test("NOT IN single column with nulls predicate subquery") {
    +    // Test cases for single-column ``WHERE a NOT IN (SELECT c FROM r ...)'':
    +    // | # | does subquery include null? | is a null? | a = c? | row with a included in result? |
    +    // | 1 | empty                       |            |        | yes                            |
    +    // | 2 | yes                         |            |        | no                             |
    +    // | 3 | no                          | yes        |        | no                             |
    +    // | 4 | no                          | no         | yes    | no                             |
    +    // | 5 | no                          | no         | no     | yes                            |
    +    Seq(row((null, 5.0)), row((3, 3.0))).toDF("a", "b").createOrReplaceTempView("m")
    --- End diff --
    
    Please use `withTempView` to ensure these views are dropped. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org