You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2018/11/30 09:25:09 UTC
[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...
GitHub user gengliangwang opened a pull request:
https://github.com/apache/spark/pull/23186
[SPARK-26230][SQL]FileIndex: if case sensitive, validate partitions with original column names
## What changes were proposed in this pull request?
Partition column name is required to be unique under the same directory. The following paths are invalid partitioned directory:
```
hdfs://host:9000/path/a=1
hdfs://host:9000/path/b=2
```
If case sensitive, the following paths should be invalid too:
```
hdfs://host:9000/path/a=1
hdfs://host:9000/path/A=2
```
Since column 'a' and 'A' are different, and it is wrong to use either one as the column name in partition schema.
Also, there is a `TODO` comment in the code.
This PR is to resolve the problem.
## How was this patch tested?
Add unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gengliangwang/spark SPARK-26230
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/23186.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #23186
----
commit 6d052130051a21b9aa7c3ffce56a556bee129a5e
Author: Gengliang Wang <ge...@...>
Date: 2018-11-30T09:23:32Z
if case sensitive, validate partitions with original column names
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5607/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99589/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/23186
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23186
**[Test build #99589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99589/testReport)** for PR 23186 at commit [`88d8f0a`](https://github.com/apache/spark/commit/88d8f0a0bb9b408df6c4135448035336e329f3e5).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/23186
retest this pleae
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/23186
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23186
**[Test build #99598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99598/testReport)** for PR 23186 at commit [`88d8f0a`](https://github.com/apache/spark/commit/88d8f0a0bb9b408df6c4135448035336e329f3e5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23186
**[Test build #99546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99546/testReport)** for PR 23186 at commit [`313366d`](https://github.com/apache/spark/commit/313366d58075daba055c392682abaa01fbb574ee).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23186
**[Test build #99506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99506/testReport)** for PR 23186 at commit [`6d05213`](https://github.com/apache/spark/commit/6d052130051a21b9aa7c3ffce56a556bee129a5e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23186
**[Test build #99506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99506/testReport)** for PR 23186 at commit [`6d05213`](https://github.com/apache/spark/commit/6d052130051a21b9aa7c3ffce56a556bee129a5e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/23186#discussion_r237888926
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
@@ -345,15 +346,18 @@ object PartitioningUtils {
*/
def resolvePartitions(
pathsWithPartitionValues: Seq[(Path, PartitionValues)],
+ caseSensitive: Boolean,
timeZone: TimeZone): Seq[PartitionValues] = {
if (pathsWithPartitionValues.isEmpty) {
Seq.empty
} else {
- // TODO: Selective case sensitivity.
- val distinctPartColNames =
- pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase())).distinct
+ val distinctPartColNames = if (caseSensitive) {
--- End diff --
nit: maybe rename as there is no distinct anymore?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5573/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/23186
@cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99546/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5646/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23186
**[Test build #99598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99598/testReport)** for PR 23186 at commit [`88d8f0a`](https://github.com/apache/spark/commit/88d8f0a0bb9b408df6c4135448035336e329f3e5).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99506/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23186
**[Test build #99589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99589/testReport)** for PR 23186 at commit [`88d8f0a`](https://github.com/apache/spark/commit/88d8f0a0bb9b408df6c4135448035336e329f3e5).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/23186#discussion_r238008395
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
@@ -345,15 +346,18 @@ object PartitioningUtils {
*/
def resolvePartitions(
pathsWithPartitionValues: Seq[(Path, PartitionValues)],
+ caseSensitive: Boolean,
timeZone: TimeZone): Seq[PartitionValues] = {
if (pathsWithPartitionValues.isEmpty) {
Seq.empty
} else {
- // TODO: Selective case sensitivity.
- val distinctPartColNames =
- pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase())).distinct
+ val distinctPartColNames = if (caseSensitive) {
+ pathsWithPartitionValues.map(_._2.columnNames)
+ } else {
+ pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase()))
+ }
assert(
- distinctPartColNames.size == 1,
+ distinctPartColNames.distinct.size == 1,
listConflictingPartitionColumns(pathsWithPartitionValues))
--- End diff --
yes I see, thanks for the kind explanation.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/23186#discussion_r237889346
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
@@ -345,15 +346,18 @@ object PartitioningUtils {
*/
def resolvePartitions(
pathsWithPartitionValues: Seq[(Path, PartitionValues)],
+ caseSensitive: Boolean,
timeZone: TimeZone): Seq[PartitionValues] = {
if (pathsWithPartitionValues.isEmpty) {
Seq.empty
} else {
- // TODO: Selective case sensitivity.
- val distinctPartColNames =
- pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase())).distinct
+ val distinctPartColNames = if (caseSensitive) {
+ pathsWithPartitionValues.map(_._2.columnNames)
+ } else {
+ pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase()))
+ }
assert(
- distinctPartColNames.size == 1,
+ distinctPartColNames.distinct.size == 1,
listConflictingPartitionColumns(pathsWithPartitionValues))
--- End diff --
why don't we use `distinctPartColNames` as parameter here? Moreover, is that method working fine according to case sensitivity?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/23186#discussion_r237889521
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala ---
@@ -65,6 +65,34 @@ class FileIndexSuite extends SharedSQLContext {
}
}
+ test("SPARK-26230: if case sensitive, validate partitions with original column names") {
+ withTempDir { dir =>
+ val partitionDirectory = new File(dir, s"a=1")
+ partitionDirectory.mkdir()
+ val file = new File(partitionDirectory, "text.txt")
+ stringToFile(file, "text")
+ val partitionDirectory2 = new File(dir, s"A=2")
+ partitionDirectory2.mkdir()
+ val file2 = new File(partitionDirectory2, "text.txt")
+ stringToFile(file2, "text")
+ val path = new Path(dir.getCanonicalPath)
+
+ withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") {
+ val fileIndex = new InMemoryFileIndex(spark, Seq(path), Map.empty, None)
+ val partitionValues = fileIndex.partitionSpec().partitions.map(_.values)
+ assert(partitionValues.length == 2)
+ }
+
+ withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") {
+ val msg = intercept[AssertionError] {
+ val fileIndex = new InMemoryFileIndex(spark, Seq(path), Map.empty, None)
+ fileIndex.partitionSpec()
+ }.getMessage
+ assert(msg.contains("Conflicting partition column names detected"))
--- End diff --
can we ensure that the message contains the right partitions?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5655/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23186
**[Test build #99546 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99546/testReport)** for PR 23186 at commit [`313366d`](https://github.com/apache/spark/commit/313366d58075daba055c392682abaa01fbb574ee).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99598/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/23186
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23186
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/23186#discussion_r237963856
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
@@ -345,15 +346,18 @@ object PartitioningUtils {
*/
def resolvePartitions(
pathsWithPartitionValues: Seq[(Path, PartitionValues)],
+ caseSensitive: Boolean,
timeZone: TimeZone): Seq[PartitionValues] = {
if (pathsWithPartitionValues.isEmpty) {
Seq.empty
} else {
- // TODO: Selective case sensitivity.
- val distinctPartColNames =
- pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase())).distinct
+ val distinctPartColNames = if (caseSensitive) {
+ pathsWithPartitionValues.map(_._2.columnNames)
+ } else {
+ pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase()))
+ }
assert(
- distinctPartColNames.size == 1,
+ distinctPartColNames.distinct.size == 1,
listConflictingPartitionColumns(pathsWithPartitionValues))
--- End diff --
The method `listConflictingPartitionColumns` also shows the suspicious paths.
If case sensitive, the method works fine.
If case insensitive, it will list all column names without any transformation. e.g.
```
Partition column name list #0: a
Partition column name list #1: A
Partition column name list #2: B
```
I can fix the method listConflictingPartitionColumns. But seems a bit trivial, we will have to display the original column names instead of transforming all to lower case .
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23186#discussion_r237990120
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala ---
@@ -65,6 +65,34 @@ class FileIndexSuite extends SharedSQLContext {
}
}
+ test("SPARK-26230: if case sensitive, validate partitions with original column names") {
+ withTempDir { dir =>
+ val partitionDirectory = new File(dir, s"a=1")
--- End diff --
nit. Unnecessary `s`: `s"a=1"` -> `"a=1"`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org