You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/23 05:45:29 UTC

[GitHub] [spark] sadikovi opened a new pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

sadikovi opened a new pull request #34995:
URL: https://github.com/apache/spark/pull/34995


   ### What changes were proposed in this pull request?
   
   This PR escapes `.` character in partition names as certain file systems (for example, ABFS) may not support it.
   
   I simply added the `.` character to the list of special characters in `ExternalCatalogUtils`. The class already covers backward compatibility so it would read the existing unescaped values with dots correctly.
   
   ### Why are the changes needed?
   Fixes an issue when ABFS throws `Caused by: java.lang.IllegalArgumentException: ABFS does not allow files or directories to end with a dot.` exception if a partition name contains `.` symbol.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   There should also be no backward compatibility issues since ExternalCatalogUtils allows reading both escaped and unescaped values.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   Added a unit test in `InsertSuite` to confirm that dots are now escaped.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000060949


   **[Test build #146502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146502/testReport)** for PR 34995 at commit [`6c593f3`](https://github.com/apache/spark/commit/6c593f3ea26015bbb4db74e8e0cf198bf1c93f94).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000587781


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51014/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi edited a comment on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi edited a comment on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1002391822


   Well, it actually is affected. When I run the following commands in my branch: 
   ```
   CREATE TABLE escapeDots (key int) USING parquet PARTITIONED BY (value string);
   INSERT INTO TABLE escapeDots PARTITION(value = 'blah/') VALUES (1);
   INSERT INTO TABLE escapeDots PARTITION(value = 'blah{}') VALUES (2);
   ```
   
   I get the following output for `SHOW PARTITIONS escapeDots;`: 
   ```
   +--------------+
   |     partition|
   +--------------+
   | value=blah%2F|
   |value=blah%7B}|
   +--------------+
   ```
   
   Seems there are several issues in partitioning code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1009562367


   @sadikovi is this PR unblocked now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1009563327


   Apologies @cloud-fan. I have not had time to follow up on the test failures and comments. I will get back to this shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34995:
URL: https://github.com/apache/spark/pull/34995#discussion_r774595441



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala
##########
@@ -225,6 +225,32 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter
     checkAnswer(sql(selQuery), Row(5, 2, 3, 6))
   }
 
+  test("SPARK-37722: Escape dots in partition names") {
+    withTable("escapeDots") {
+      withSQLConf("hive.exec.dynamic.partition.mode" -> "nonstrict") {
+        sql("CREATE TABLE escapeDots (key int) PARTITIONED BY (value string);")
+        sql("INSERT INTO TABLE escapeDots VALUES (1, 'text.');")
+        sql("INSERT INTO TABLE escapeDots VALUES (2, '.');")

Review comment:
       can we test static partitions? `INSERT INTO TABLE escapeDots PARTITION(value = '.') VALUES (4)`

##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala
##########
@@ -225,6 +225,32 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter
     checkAnswer(sql(selQuery), Row(5, 2, 3, 6))
   }
 
+  test("SPARK-37722: Escape dots in partition names") {
+    withTable("escapeDots") {
+      withSQLConf("hive.exec.dynamic.partition.mode" -> "nonstrict") {
+        sql("CREATE TABLE escapeDots (key int) PARTITIONED BY (value string);")
+        sql("INSERT INTO TABLE escapeDots VALUES (1, 'text.');")
+        sql("INSERT INTO TABLE escapeDots VALUES (2, '.');")

Review comment:
       can we test static partitions? `INSERT INTO TABLE escapeDots PARTITION(value = '.') VALUES (2)`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34995:
URL: https://github.com/apache/spark/pull/34995#discussion_r774594795



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala
##########
@@ -225,6 +225,32 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter
     checkAnswer(sql(selQuery), Row(5, 2, 3, 6))
   }
 
+  test("SPARK-37722: Escape dots in partition names") {

Review comment:
       Since it's not related to hive, shall we move the test to `org.apache.spark.sql.sources.InsertSuite`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000101066


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50978/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #34995:
URL: https://github.com/apache/spark/pull/34995#discussion_r776076784



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
##########
@@ -52,7 +52,7 @@ object ExternalCatalogUtils {
       '\n', '\u000B', '\u000C', '\r', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013',
       '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', '\u001A', '\u001B', '\u001C',
       '\u001D', '\u001E', '\u001F', '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F',
-      '{', '[', ']', '^')
+      '{', '[', ']', '^', '.')

Review comment:
       It seems that Hive has the identical code. What happens between Hive and Spark after this PR?
   
   https://github.com/apache/hive/blob/9857c4e584384f7b0a49c34bc2bdf876c2ea1503/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L251




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1002287620


   cc @sunchao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a change in pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi commented on a change in pull request #34995:
URL: https://github.com/apache/spark/pull/34995#discussion_r776119783



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
##########
@@ -52,7 +52,7 @@ object ExternalCatalogUtils {
       '\n', '\u000B', '\u000C', '\r', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013',
       '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', '\u001A', '\u001B', '\u001C',
       '\u001D', '\u001E', '\u001F', '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F',
-      '{', '[', ']', '^')
+      '{', '[', ']', '^', '.')

Review comment:
       I thought the code would translate any escaped values, so it should not matter what characters are encoded. As long as decoding code is generic enough to handle all escaped values, older Spark versions should be able to read it. I will verify this. For now, there is an issue in partitioning that prevents the PR to pass the tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi edited a comment on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi edited a comment on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1002391822


   Well, it actually is affected. When I run the following commands in my branch: 
   ```
   CREATE TABLE escapeDots (key int) USING parquet PARTITIONED BY (value string);
   INSERT INTO TABLE escapeDots PARTITION(value = 'blah/') VALUES (1);
   INSERT INTO TABLE escapeDots PARTITION(value = 'blah{}') VALUES (2);
   ```
   
   I get the following output for `SHOW PARTITIONS escapeDots;`: 
   ```
   +--------------+
   |     partition|
   +--------------+
   | value=blah%2F|
   |value=blah%7B}|
   +--------------+
   ```
   
   Seems there are several issues in partitioning code which are not related to my changes. Do you know if there are any tests for "SHOW PARTITIONS" that verify the escaped values?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000592334


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146539/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000113808


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146502/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1002400892


   We have `ShowPartitionsSuiteBase`, which doesn't test the escaping behavior. @sadikovi can you add a new test case there and fix `SHOW PARTITIONS`? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1002385598


   I updated the PR description. Due to double values not applying `unescapePathName` method to the raw value, this could potentially break backward compatibility for double values as partition values + user-facing changes in SHOW PARTITIONS. 
   
   @cloud-fan @dongjoon-hyun let me know if this is a concern, I imagine it should be possible to backport the change to older Spark versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000565580


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51014/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000592334


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146539/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a change in pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi commented on a change in pull request #34995:
URL: https://github.com/apache/spark/pull/34995#discussion_r774797055



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala
##########
@@ -225,6 +225,32 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter
     checkAnswer(sql(selQuery), Row(5, 2, 3, 6))
   }
 
+  test("SPARK-37722: Escape dots in partition names") {

Review comment:
       I moved the test, thanks for pointing it out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1001062270


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1001469177


   > I think there is a bug in the partitioning cast where the value is inferred using the raw value which could contain escaped characters.
   
   Can you open a new PR to fix this bug? We can discuss the details there. I don't think it's a known issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1002405648


   I will open a separate PR to fix it soon-ish, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1002285450


   +1 for @cloud-fan 's request. We need to a new JIRA for that, @sadikovi .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000113616


   **[Test build #146502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146502/testReport)** for PR 34995 at commit [`6c593f3`](https://github.com/apache/spark/commit/6c593f3ea26015bbb4db74e8e0cf198bf1c93f94).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000110569


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50978/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000554251


   **[Test build #146539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146539/testReport)** for PR 34995 at commit [`8a402b9`](https://github.com/apache/spark/commit/8a402b9a36901360e0f91c4f4824ade001bfa7de).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000060949


   **[Test build #146502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146502/testReport)** for PR 34995 at commit [`6c593f3`](https://github.com/apache/spark/commit/6c593f3ea26015bbb4db74e8e0cf198bf1c93f94).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000581399


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51014/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a change in pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi commented on a change in pull request #34995:
URL: https://github.com/apache/spark/pull/34995#discussion_r776091154



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
##########
@@ -52,7 +52,7 @@ object ExternalCatalogUtils {
       '\n', '\u000B', '\u000C', '\r', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013',
       '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', '\u001A', '\u001B', '\u001C',
       '\u001D', '\u001E', '\u001F', '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F',
-      '{', '[', ']', '^')
+      '{', '[', ']', '^', '.')

Review comment:
       I don't know what happens in Hive as I don't know how to test that code, I am happy to verify the changes there if someone can explain how to test these changes in Hive. 
   
   My understanding from reading https://github.com/apache/hive/blob/9857c4e584384f7b0a49c34bc2bdf876c2ea1503/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L300 is that any escaped sequence will be transformed correctly regardless of `clist` - it is only used for writes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1002305771


   I have not tested it but it seems Hive would have the same bug writing partitions that end with `.` character to ABFS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a change in pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi commented on a change in pull request #34995:
URL: https://github.com/apache/spark/pull/34995#discussion_r776143980



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
##########
@@ -52,7 +52,7 @@ object ExternalCatalogUtils {
       '\n', '\u000B', '\u000C', '\r', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013',
       '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', '\u001A', '\u001B', '\u001C',
       '\u001D', '\u001E', '\u001F', '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F',
-      '{', '[', ']', '^')
+      '{', '[', ']', '^', '.')

Review comment:
       I updated the PR description as I decided to keep all of the changes in one PR. The patch does introduce backward incompatible changes when writing double values as partition values.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000554251


   **[Test build #146539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146539/testReport)** for PR 34995 at commit [`8a402b9`](https://github.com/apache/spark/commit/8a402b9a36901360e0f91c4f4824ade001bfa7de).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000587781


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51014/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi edited a comment on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi edited a comment on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000545069


   I think there is  a bug in the partitioning cast where the value is inferred using the raw value which could contain escaped characters. We escape column name but not the value! For example, if you have double value 4.5, you end up with trying to infer double from `4%2E5`. See https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L307-L311 and https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L491. Timestamps escape value separately in the "inferPartitionColumnValue" method.
   
   IMHO, "inferPartitionColumnValue" method should already take the actual value that was unescaped, not the raw one. Because of this issue, this PR introduces breaking changes as type inference could be incorrect in doubles and decimals.
   
   @cloud-fan is this a known issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34995:
URL: https://github.com/apache/spark/pull/34995#discussion_r776110848



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
##########
@@ -52,7 +52,7 @@ object ExternalCatalogUtils {
       '\n', '\u000B', '\u000C', '\r', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013',
       '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', '\u001A', '\u001B', '\u001C',
       '\u001D', '\u001E', '\u001F', '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F',
-      '{', '[', ']', '^')
+      '{', '[', ']', '^', '.')

Review comment:
       > There should also be no backward compatibility issues since ExternalCatalogUtils allows reading both escaped and unescaped values.
   
   I guess reading Hive tables are fine but it might break for Hive to read Spark's tables written.
   
   Shall we better just add a configuration to control this? Also, I think we should better leave a comment that this part is different from Hive's (since we have comments above "The following string escaping code is mainly copied from Hive (o.a.h.h.common.FileUtils).").




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34995:
URL: https://github.com/apache/spark/pull/34995#discussion_r776120196



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
##########
@@ -52,7 +52,7 @@ object ExternalCatalogUtils {
       '\n', '\u000B', '\u000C', '\r', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013',
       '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', '\u001A', '\u001B', '\u001C',
       '\u001D', '\u001E', '\u001F', '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F',
-      '{', '[', ']', '^')
+      '{', '[', ']', '^', '.')

Review comment:
       That would be great! If it can read/write these cases, I think that addresses @dongjoon-hyun's concern, and the change should be safe to go.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi edited a comment on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi edited a comment on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1002305771


   I have not tested it in Hive but it seems Hive would have the same bug writing partitions that end with `.` character to ABFS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #34995:
URL: https://github.com/apache/spark/pull/34995#discussion_r776076784



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
##########
@@ -52,7 +52,7 @@ object ExternalCatalogUtils {
       '\n', '\u000B', '\u000C', '\r', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013',
       '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', '\u001A', '\u001B', '\u001C',
       '\u001D', '\u001E', '\u001F', '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F',
-      '{', '[', ']', '^')
+      '{', '[', ']', '^', '.')

Review comment:
       For me, it seems that Hive has the identical code.
   
   https://github.com/apache/hive/blob/9857c4e584384f7b0a49c34bc2bdf876c2ea1503/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L251




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #34995:
URL: https://github.com/apache/spark/pull/34995#discussion_r776076164



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
##########
@@ -52,7 +52,7 @@ object ExternalCatalogUtils {
       '\n', '\u000B', '\u000C', '\r', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013',
       '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', '\u001A', '\u001B', '\u001C',
       '\u001D', '\u001E', '\u001F', '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F',
-      '{', '[', ']', '^')
+      '{', '[', ']', '^', '.')

Review comment:
       What about the behavior of Apache Hive?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1002391822


   Well, it actually is affected. When I run the following commands in my branch: 
   ```
   CREATE TABLE escapeDots (key int) USING parquet PARTITIONED BY (value string);
   INSERT INTO TABLE escapeDots PARTITION(value = 'blah/') VALUES (1);
   INSERT INTO TABLE escapeDots PARTITION(value = 'blah{}') VALUES (2);
   ```
   
   I get the following output: 
   ```
   +--------------+
   |     partition|
   +--------------+
   | value=blah%2F|
   |value=blah%7B}|
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34995:
URL: https://github.com/apache/spark/pull/34995#discussion_r776110964



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
##########
@@ -52,7 +52,7 @@ object ExternalCatalogUtils {
       '\n', '\u000B', '\u000C', '\r', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013',
       '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', '\u001A', '\u001B', '\u001C',
       '\u001D', '\u001E', '\u001F', '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F',
-      '{', '[', ']', '^')
+      '{', '[', ']', '^', '.')

Review comment:
       Actually, it would also break the forward compatibility too (old Spark versions to read the tables written by new Spark versions), which we don't guarantee in general but would be great to have a switch to control that in any event.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000592216


   **[Test build #146539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146539/testReport)** for PR 34995 at commit [`8a402b9`](https://github.com/apache/spark/commit/8a402b9a36901360e0f91c4f4824ade001bfa7de).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1002390354


   The changes to `SHOW PARTITIONS` looks incorrect. Escaping is an internal detail because we write partition values to directory names. This is not always true for modern data lake formats which store partition values and other metadata in log files. `SHOW PARTITIONS` should not be affected by this internal detail and always print the actual partition value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1002382681


   AFAIK this code was copied from Hive, which means Hive can read any escaped chars, but it does not escape dot when writing. I don't see a compatibility issue here. I think we can even open a ticket for Hive and ask it to escape dot as well, to work with ABFS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000113808


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146502/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
sadikovi commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000545069


   I think there is  a bug in the partitioning cast where the value is inferred using the raw value which could contain escaped characters. We escape column name but not the value! For example, if you have double value 4.5, you end up with trying to infer double from `4%2E5`. See https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L307-L311 and https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L491. Timestamps escape value separately in the "inferPartitionColumnValue" method.
   
   IMHO, "inferPartitionColumnValue" method should already take the actual value that was unescaped, not the raw one.
   
   @cloud-fan is this a known issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000079716


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50978/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000110569


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50978/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org