You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "wangyum (via GitHub)" <gi...@apache.org> on 2023/02/20 05:25:52 UTC

[GitHub] [spark] wangyum opened a new pull request, #40090: [SPARK-41741][SQL] Encode the string using the UTF_8 charset in ParquetFilters

wangyum opened a new pull request, #40090:
URL: https://github.com/apache/spark/pull/40090

   ### What changes were proposed in this pull request?
   
   This PR makes it encode the string using the `UTF_8` charset in `ParquetFilters`.
   
   ### Why are the changes needed?
   
   Fix data issue where the default charset is not `UTF_8`.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40090: [SPARK-41741][SQL] Encode the string using the UTF_8 charset in ParquetFilters

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #40090:
URL: https://github.com/apache/spark/pull/40090#discussion_r1111488834


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala:
##########
@@ -2133,6 +2134,30 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared
       }
     }
   }
+
+  test("SPARK-41741: StringStartsWith should encode the string using the UTF_8 charset") {
+    // A hacky way to set the default Java character encoding.
+    def setDefaultEncoding(charset: Charset): Unit = {
+      System.setProperty("file.encoding", charset.name())
+      val defaultCharsetField = classOf[Charset].getDeclaredField("defaultCharset")
+      defaultCharsetField.setAccessible(true)

Review Comment:
   I think it's fine without this test - It's too hacky.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum closed pull request #40090: [SPARK-41741][SQL] Encode the string using the UTF_8 charset in ParquetFilters

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum closed pull request #40090: [SPARK-41741][SQL] Encode the string using the UTF_8 charset in ParquetFilters
URL: https://github.com/apache/spark/pull/40090


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #40090: [SPARK-41741][SQL] Encode the string using the UTF_8 charset in ParquetFilters

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on PR #40090:
URL: https://github.com/apache/spark/pull/40090#issuecomment-1436809590

   Merged to master, branch-3.4 and branch-3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org