You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2018/11/30 09:25:09 UTC

[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...

GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/23186

    [SPARK-26230][SQL]FileIndex: if case sensitive, validate partitions with original column names

    ## What changes were proposed in this pull request?
    
    Partition column name is required to be unique under the same directory. The following paths are invalid partitioned directory:
    ```
    hdfs://host:9000/path/a=1
    hdfs://host:9000/path/b=2
    ```
    
    If case sensitive, the following paths should be invalid too:
    ```
    hdfs://host:9000/path/a=1
    hdfs://host:9000/path/A=2
    ```
    Since column 'a' and 'A' are different, and it is wrong to use either one as the column name in partition schema.
    
    Also, there is a `TODO` comment in the code.
    
    This PR is to resolve the problem.
    
    ## How was this patch tested?
    
    Add unit test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark SPARK-26230

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23186.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23186
    
----
commit 6d052130051a21b9aa7c3ffce56a556bee129a5e
Author: Gengliang Wang <ge...@...>
Date:   2018-11-30T09:23:32Z

     if case sensitive, validate partitions with original column names

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5607/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99589/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    **[Test build #99589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99589/testReport)** for PR 23186 at commit [`88d8f0a`](https://github.com/apache/spark/commit/88d8f0a0bb9b408df6c4135448035336e329f3e5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    retest this pleae


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23186


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    **[Test build #99598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99598/testReport)** for PR 23186 at commit [`88d8f0a`](https://github.com/apache/spark/commit/88d8f0a0bb9b408df6c4135448035336e329f3e5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    **[Test build #99546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99546/testReport)** for PR 23186 at commit [`313366d`](https://github.com/apache/spark/commit/313366d58075daba055c392682abaa01fbb574ee).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    **[Test build #99506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99506/testReport)** for PR 23186 at commit [`6d05213`](https://github.com/apache/spark/commit/6d052130051a21b9aa7c3ffce56a556bee129a5e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    **[Test build #99506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99506/testReport)** for PR 23186 at commit [`6d05213`](https://github.com/apache/spark/commit/6d052130051a21b9aa7c3ffce56a556bee129a5e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23186#discussion_r237888926
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -345,15 +346,18 @@ object PartitioningUtils {
        */
       def resolvePartitions(
           pathsWithPartitionValues: Seq[(Path, PartitionValues)],
    +      caseSensitive: Boolean,
           timeZone: TimeZone): Seq[PartitionValues] = {
         if (pathsWithPartitionValues.isEmpty) {
           Seq.empty
         } else {
    -      // TODO: Selective case sensitivity.
    -      val distinctPartColNames =
    -        pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase())).distinct
    +      val distinctPartColNames = if (caseSensitive) {
    --- End diff --
    
    nit: maybe rename as there is no distinct anymore?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5573/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99546/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5646/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    **[Test build #99598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99598/testReport)** for PR 23186 at commit [`88d8f0a`](https://github.com/apache/spark/commit/88d8f0a0bb9b408df6c4135448035336e329f3e5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99506/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    **[Test build #99589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99589/testReport)** for PR 23186 at commit [`88d8f0a`](https://github.com/apache/spark/commit/88d8f0a0bb9b408df6c4135448035336e329f3e5).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23186#discussion_r238008395
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -345,15 +346,18 @@ object PartitioningUtils {
        */
       def resolvePartitions(
           pathsWithPartitionValues: Seq[(Path, PartitionValues)],
    +      caseSensitive: Boolean,
           timeZone: TimeZone): Seq[PartitionValues] = {
         if (pathsWithPartitionValues.isEmpty) {
           Seq.empty
         } else {
    -      // TODO: Selective case sensitivity.
    -      val distinctPartColNames =
    -        pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase())).distinct
    +      val distinctPartColNames = if (caseSensitive) {
    +        pathsWithPartitionValues.map(_._2.columnNames)
    +      } else {
    +        pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase()))
    +      }
           assert(
    -        distinctPartColNames.size == 1,
    +        distinctPartColNames.distinct.size == 1,
             listConflictingPartitionColumns(pathsWithPartitionValues))
    --- End diff --
    
    yes I see, thanks for the kind explanation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23186#discussion_r237889346
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -345,15 +346,18 @@ object PartitioningUtils {
        */
       def resolvePartitions(
           pathsWithPartitionValues: Seq[(Path, PartitionValues)],
    +      caseSensitive: Boolean,
           timeZone: TimeZone): Seq[PartitionValues] = {
         if (pathsWithPartitionValues.isEmpty) {
           Seq.empty
         } else {
    -      // TODO: Selective case sensitivity.
    -      val distinctPartColNames =
    -        pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase())).distinct
    +      val distinctPartColNames = if (caseSensitive) {
    +        pathsWithPartitionValues.map(_._2.columnNames)
    +      } else {
    +        pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase()))
    +      }
           assert(
    -        distinctPartColNames.size == 1,
    +        distinctPartColNames.distinct.size == 1,
             listConflictingPartitionColumns(pathsWithPartitionValues))
    --- End diff --
    
    why don't we use `distinctPartColNames` as parameter here? Moreover, is that method working fine according to case sensitivity?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23186#discussion_r237889521
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala ---
    @@ -65,6 +65,34 @@ class FileIndexSuite extends SharedSQLContext {
         }
       }
     
    +  test("SPARK-26230: if case sensitive, validate partitions with original column names") {
    +    withTempDir { dir =>
    +      val partitionDirectory = new File(dir, s"a=1")
    +      partitionDirectory.mkdir()
    +      val file = new File(partitionDirectory, "text.txt")
    +      stringToFile(file, "text")
    +      val partitionDirectory2 = new File(dir, s"A=2")
    +      partitionDirectory2.mkdir()
    +      val file2 = new File(partitionDirectory2, "text.txt")
    +      stringToFile(file2, "text")
    +      val path = new Path(dir.getCanonicalPath)
    +
    +      withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") {
    +        val fileIndex = new InMemoryFileIndex(spark, Seq(path), Map.empty, None)
    +        val partitionValues = fileIndex.partitionSpec().partitions.map(_.values)
    +        assert(partitionValues.length == 2)
    +      }
    +
    +      withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") {
    +        val msg = intercept[AssertionError] {
    +          val fileIndex = new InMemoryFileIndex(spark, Seq(path), Map.empty, None)
    +          fileIndex.partitionSpec()
    +        }.getMessage
    +        assert(msg.contains("Conflicting partition column names detected"))
    --- End diff --
    
    can we ensure that the message contains the right partitions?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5655/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    **[Test build #99546 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99546/testReport)** for PR 23186 at commit [`313366d`](https://github.com/apache/spark/commit/313366d58075daba055c392682abaa01fbb574ee).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99598/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23186
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23186#discussion_r237963856
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ---
    @@ -345,15 +346,18 @@ object PartitioningUtils {
        */
       def resolvePartitions(
           pathsWithPartitionValues: Seq[(Path, PartitionValues)],
    +      caseSensitive: Boolean,
           timeZone: TimeZone): Seq[PartitionValues] = {
         if (pathsWithPartitionValues.isEmpty) {
           Seq.empty
         } else {
    -      // TODO: Selective case sensitivity.
    -      val distinctPartColNames =
    -        pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase())).distinct
    +      val distinctPartColNames = if (caseSensitive) {
    +        pathsWithPartitionValues.map(_._2.columnNames)
    +      } else {
    +        pathsWithPartitionValues.map(_._2.columnNames.map(_.toLowerCase()))
    +      }
           assert(
    -        distinctPartColNames.size == 1,
    +        distinctPartColNames.distinct.size == 1,
             listConflictingPartitionColumns(pathsWithPartitionValues))
    --- End diff --
    
    The method `listConflictingPartitionColumns` also shows the suspicious paths.
    If case sensitive, the method works fine. 
    If case insensitive, it will list all column names without any transformation. e.g. 
    ```
    	Partition column name list #0: a
            Partition column name list #1: A
    	Partition column name list #2: B
    ```
    I can fix the method listConflictingPartitionColumns. But seems a bit trivial, we will have to display the original column names instead of  transforming all to lower case .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23186#discussion_r237990120
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala ---
    @@ -65,6 +65,34 @@ class FileIndexSuite extends SharedSQLContext {
         }
       }
     
    +  test("SPARK-26230: if case sensitive, validate partitions with original column names") {
    +    withTempDir { dir =>
    +      val partitionDirectory = new File(dir, s"a=1")
    --- End diff --
    
    nit. Unnecessary `s`: `s"a=1"` -> `"a=1"`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org