You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/08/08 12:11:04 UTC
[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...
GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/22036
[SPARK-25028][SQL] Avoid NPE when analyzing partition with NULL values
## What changes were proposed in this pull request?
`ANALYZE TABLE ... PARTITION(...) COMPUTE STATISTICS` can fail with a NPE if a partition column contains a NULL value.
The PR avoids the NPE, replacing the `NULL` values with the default partition placeholder.
## How was this patch tested?
added UT
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-25028
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22036.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22036
----
commit ee64a6ba3d41c0f3d6776d7ccbd9af7185d8a3ad
Author: Marco Gaido <ma...@...>
Date: 2018-08-08T12:07:33Z
[SPARK-25028][SQL] Avoid NPE when analyzing partition with NULL values
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22036
**[Test build #94683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94683/testReport)** for PR 22036 at commit [`0470c2d`](https://github.com/apache/spark/commit/0470c2d881d329655e74296f10b29c3d36778471).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22036#discussion_r209536881
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzePartitionCommand.scala ---
@@ -140,7 +140,13 @@ case class AnalyzePartitionCommand(
val df = tableDf.filter(Column(filter)).groupBy(partitionColumns: _*).count()
df.collect().map { r =>
- val partitionColumnValues = partitionColumns.indices.map(r.get(_).toString)
+ val partitionColumnValues = partitionColumns.indices.map { i =>
+ if (r.isNullAt(i)) {
+ ExternalCatalogUtils.DEFAULT_PARTITION_NAME
--- End diff --
I don't think so, as the same situation would happen if Hive's statistics are used instead of the ones computed by Spark
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94423/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22036
**[Test build #94693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94693/testReport)** for PR 22036 at commit [`0470c2d`](https://github.com/apache/spark/commit/0470c2d881d329655e74296f10b29c3d36778471).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22036
cc @cloud-fan @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22036#discussion_r209534161
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -204,6 +204,24 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
}
}
+ test("SPARK-25028: column stats collection for null partitioning columns") {
+ val table = "analyze_partition_with_null"
+ withTempDir { dir =>
+ withTable(table) {
+ sql(s"""
+ |CREATE TABLE $table (name string, value string)
+ |USING PARQUET
+ |PARTITIONED BY (name)
+ |LOCATION '${dir.toURI}'""".stripMargin)
+ val df = Seq(("a", null), ("b", null)).toDF("value", "name")
--- End diff --
ok, will do, thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2130/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2138/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22036
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22036#discussion_r209481344
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzePartitionCommand.scala ---
@@ -140,7 +140,13 @@ case class AnalyzePartitionCommand(
val df = tableDf.filter(Column(filter)).groupBy(partitionColumns: _*).count()
df.collect().map { r =>
- val partitionColumnValues = partitionColumns.indices.map(r.get(_).toString)
+ val partitionColumnValues = partitionColumns.indices.map { i =>
+ if (r.isNullAt(i)) {
+ ExternalCatalogUtils.DEFAULT_PARTITION_NAME
--- End diff --
do we need to chang the read path? i.e. where we use these statistics.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94686/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22036
**[Test build #94683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94683/testReport)** for PR 22036 at commit [`0470c2d`](https://github.com/apache/spark/commit/0470c2d881d329655e74296f10b29c3d36778471).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22036
**[Test build #94686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94686/testReport)** for PR 22036 at commit [`0470c2d`](https://github.com/apache/spark/commit/0470c2d881d329655e74296f10b29c3d36778471).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22036
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22036
**[Test build #94693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94693/testReport)** for PR 22036 at commit [`0470c2d`](https://github.com/apache/spark/commit/0470c2d881d329655e74296f10b29c3d36778471).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2128/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22036
**[Test build #94686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94686/testReport)** for PR 22036 at commit [`0470c2d`](https://github.com/apache/spark/commit/0470c2d881d329655e74296f10b29c3d36778471).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94693/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22036
thanks, merging to master/2.3!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94683/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22036
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22036
**[Test build #94423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94423/testReport)** for PR 22036 at commit [`ee64a6b`](https://github.com/apache/spark/commit/ee64a6ba3d41c0f3d6776d7ccbd9af7185d8a3ad).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22036#discussion_r208795446
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -204,6 +204,24 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
}
}
+ test("SPARK-25028: column stats collection for null partitioning columns") {
+ val table = "analyze_partition_with_null"
+ withTempDir { dir =>
+ withTable(table) {
+ sql(s"""
+ |CREATE TABLE $table (name string, value string)
+ |USING PARQUET
+ |PARTITIONED BY (name)
+ |LOCATION '${dir.toURI}'""".stripMargin)
+ val df = Seq(("a", null), ("b", null)).toDF("value", "name")
--- End diff --
super nit: better to add a non-null partition value, e.g., `val df = Seq(("a", null), ("b", null), ("c", "1")).toDF("value", "name")`? btw, why is this a reverse column order (not "name", "value", but "value", "name")?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22036
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1943/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22036#discussion_r208831510
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -204,6 +204,24 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
}
}
+ test("SPARK-25028: column stats collection for null partitioning columns") {
+ val table = "analyze_partition_with_null"
+ withTempDir { dir =>
+ withTable(table) {
+ sql(s"""
+ |CREATE TABLE $table (name string, value string)
+ |USING PARQUET
+ |PARTITIONED BY (name)
+ |LOCATION '${dir.toURI}'""".stripMargin)
+ val df = Seq(("a", null), ("b", null)).toDF("value", "name")
--- End diff --
I don't think it is needed to add another partition value, as the problem here is with `null` throwing an NPE and the test shows that no NPE is thrown. But if you think it is necessary I can add it.
The reverse column order is the way spark works when inserting data into a partitioned table. The partitioning columns are specified at the end, after the non-partitioning ones.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22036
**[Test build #94423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94423/testReport)** for PR 22036 at commit [`ee64a6b`](https://github.com/apache/spark/commit/ee64a6ba3d41c0f3d6776d7ccbd9af7185d8a3ad).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22036: [SPARK-25028][SQL] Avoid NPE when analyzing parti...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22036#discussion_r209481430
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -204,6 +204,24 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared
}
}
+ test("SPARK-25028: column stats collection for null partitioning columns") {
+ val table = "analyze_partition_with_null"
+ withTempDir { dir =>
+ withTable(table) {
+ sql(s"""
+ |CREATE TABLE $table (name string, value string)
+ |USING PARQUET
+ |PARTITIONED BY (name)
+ |LOCATION '${dir.toURI}'""".stripMargin)
+ val df = Seq(("a", null), ("b", null)).toDF("value", "name")
--- End diff --
when creating the table, we can put partition column at the end, to avoid this confusion.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org