You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/08/08 12:11:04 UTC

[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/22036

    [SPARK-25028][SQL] Avoid NPE when analyzing partition with NULL values

    ## What changes were proposed in this pull request?
    
    `ANALYZE TABLE ... PARTITION(...) COMPUTE STATISTICS` can fail with a NPE if a partition column contains a NULL value.
    
    The PR avoids the NPE, replacing the `NULL` values with the default partition placeholder.
    
    ## How was this patch tested?
    
    added UT


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-25028

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22036.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22036
    
----
commit ee64a6ba3d41c0f3d6776d7ccbd9af7185d8a3ad
Author: Marco Gaido <ma...@...>
Date:   2018-08-08T12:07:33Z

    [SPARK-25028][SQL] Avoid NPE when analyzing partition with NULL values

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    **[Test build #94683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94683/testReport)** for PR 22036 at commit [`0470c2d`](https://github.com/apache/spark/commit/0470c2d881d329655e74296f10b29c3d36778471).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22036#discussion_r209536881
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzePartitionCommand.scala ---
    @@ -140,7 +140,13 @@ case class AnalyzePartitionCommand(
         val df = tableDf.filter(Column(filter)).groupBy(partitionColumns: _*).count()
     
         df.collect().map { r =>
    -      val partitionColumnValues = partitionColumns.indices.map(r.get(_).toString)
    +      val partitionColumnValues = partitionColumns.indices.map { i =>
    +        if (r.isNullAt(i)) {
    +          ExternalCatalogUtils.DEFAULT_PARTITION_NAME
    --- End diff --
    
    I don't think so, as the same situation would happen if Hive's statistics are used instead of the ones computed by Spark 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94423/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    **[Test build #94693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94693/testReport)** for PR 22036 at commit [`0470c2d`](https://github.com/apache/spark/commit/0470c2d881d329655e74296f10b29c3d36778471).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    cc @cloud-fan @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22036#discussion_r209534161
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -204,6 +204,24 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
         }
       }
     
    +  test("SPARK-25028: column stats collection for null partitioning columns") {
    +    val table = "analyze_partition_with_null"
    +    withTempDir { dir =>
    +      withTable(table) {
    +        sql(s"""
    +             |CREATE TABLE $table (name string, value string)
    +             |USING PARQUET
    +             |PARTITIONED BY (name)
    +             |LOCATION '${dir.toURI}'""".stripMargin)
    +        val df = Seq(("a", null), ("b", null)).toDF("value", "name")
    --- End diff --
    
    ok, will do, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2130/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2138/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22036


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22036#discussion_r209481344
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzePartitionCommand.scala ---
    @@ -140,7 +140,13 @@ case class AnalyzePartitionCommand(
         val df = tableDf.filter(Column(filter)).groupBy(partitionColumns: _*).count()
     
         df.collect().map { r =>
    -      val partitionColumnValues = partitionColumns.indices.map(r.get(_).toString)
    +      val partitionColumnValues = partitionColumns.indices.map { i =>
    +        if (r.isNullAt(i)) {
    +          ExternalCatalogUtils.DEFAULT_PARTITION_NAME
    --- End diff --
    
    do we need to chang the read path? i.e. where we use these statistics.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94686/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    **[Test build #94683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94683/testReport)** for PR 22036 at commit [`0470c2d`](https://github.com/apache/spark/commit/0470c2d881d329655e74296f10b29c3d36778471).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    **[Test build #94686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94686/testReport)** for PR 22036 at commit [`0470c2d`](https://github.com/apache/spark/commit/0470c2d881d329655e74296f10b29c3d36778471).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    **[Test build #94693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94693/testReport)** for PR 22036 at commit [`0470c2d`](https://github.com/apache/spark/commit/0470c2d881d329655e74296f10b29c3d36778471).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2128/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    **[Test build #94686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94686/testReport)** for PR 22036 at commit [`0470c2d`](https://github.com/apache/spark/commit/0470c2d881d329655e74296f10b29c3d36778471).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94693/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    thanks, merging to master/2.3!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94683/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    **[Test build #94423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94423/testReport)** for PR 22036 at commit [`ee64a6b`](https://github.com/apache/spark/commit/ee64a6ba3d41c0f3d6776d7ccbd9af7185d8a3ad).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22036#discussion_r208795446
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -204,6 +204,24 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
         }
       }
     
    +  test("SPARK-25028: column stats collection for null partitioning columns") {
    +    val table = "analyze_partition_with_null"
    +    withTempDir { dir =>
    +      withTable(table) {
    +        sql(s"""
    +             |CREATE TABLE $table (name string, value string)
    +             |USING PARQUET
    +             |PARTITIONED BY (name)
    +             |LOCATION '${dir.toURI}'""".stripMargin)
    +        val df = Seq(("a", null), ("b", null)).toDF("value", "name")
    --- End diff --
    
    super nit: better to add a non-null partition value, e.g., `val df = Seq(("a", null), ("b", null), ("c", "1")).toDF("value", "name")`? btw, why is this a reverse column order (not "name", "value", but "value", "name")?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1943/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...

Posted by mgaido91 <gi...@git.apache.org>.

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22036#discussion_r208831510
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -204,6 +204,24 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
         }
       }
     
    +  test("SPARK-25028: column stats collection for null partitioning columns") {
    +    val table = "analyze_partition_with_null"
    +    withTempDir { dir =>
    +      withTable(table) {
    +        sql(s"""
    +             |CREATE TABLE $table (name string, value string)
    +             |USING PARQUET
    +             |PARTITIONED BY (name)
    +             |LOCATION '${dir.toURI}'""".stripMargin)
    +        val df = Seq(("a", null), ("b", null)).toDF("value", "name")
    --- End diff --
    
    I don't think it is needed to add another partition value, as the problem here is with `null` throwing an NPE and the test shows that no NPE is thrown. But if you think it is necessary I can add it.
    
    The reverse column order is the way spark works when inserting data into a partitioned table. The partitioning columns are specified at the end, after the non-partitioning ones.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22036
  
    **[Test build #94423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94423/testReport)** for PR 22036 at commit [`ee64a6b`](https://github.com/apache/spark/commit/ee64a6ba3d41c0f3d6776d7ccbd9af7185d8a3ad).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22036#discussion_r209481430
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
    @@ -204,6 +204,24 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
         }
       }
     
    +  test("SPARK-25028: column stats collection for null partitioning columns") {
    +    val table = "analyze_partition_with_null"
    +    withTempDir { dir =>
    +      withTable(table) {
    +        sql(s"""
    +             |CREATE TABLE $table (name string, value string)
    +             |USING PARQUET
    +             |PARTITIONED BY (name)
    +             |LOCATION '${dir.toURI}'""".stripMargin)
    +        val df = Seq(("a", null), ("b", null)).toDF("value", "name")
    --- End diff --
    
    when creating the table, we can put partition column at the end, to avoid this confusion.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org