You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/07/11 04:48:28 UTC

[GitHub] [spark] itholic opened a new pull request, #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

itholic opened a new pull request, #41931:
URL: https://github.com/apache/spark/pull/41931

   ### What changes were proposed in this pull request?
   
   This PR aims enabling SQL parity test `test_sql_with_python_objects` for pandas API on Spark with Spark Connect.
   
   ### Why are the changes needed?
   
   To increase the API coverage for pandas API on Spark with Spark Connect.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   This enables `ps.sql` with Python objects.
   
   ### How was this patch tested?
   
   Reuse the existing SQL tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259296353


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                special_characters = ["\\", "'", '"', "\0", "\b", "\n", "\r", "\t"]
+                escaped_val = val
+                for char in special_characters:
+                    escaped_val = escaped_val.replace(char, "\\" + char)
+                return f"'{escaped_val}'"

Review Comment:
   This is too hacky. Is this exactly matched with JVM implementaton?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on PR #41931:
URL: https://github.com/apache/spark/pull/41931#issuecomment-1631788873

   Seems like the CI broken. The failure is not related to my changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on PR #41931:
URL: https://github.com/apache/spark/pull/41931#issuecomment-1631794331

   merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259367995


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                special_characters = ["\\", "'", '"', "\0", "\b", "\n", "\r", "\t"]
+                escaped_val = val
+                for char in special_characters:
+                    escaped_val = escaped_val.replace(char, "\\" + char)
+                return f"'{escaped_val}'"

Review Comment:
   Yeah, actually I just refer the implementation from JVM to handle the `StringType` as below because I belive this part only works when the `val` is a `str`?
   
   ```scala
       case (v: UTF8String, StringType) =>
         // Escapes all backslashes and single quotes.
         "'" + v.toString.replace("\\", "\\\\").replace("'", "\\'") + "'"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259317780


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                special_characters = ["\\", "'", '"', "\0", "\b", "\n", "\r", "\t"]
+                escaped_val = val
+                for char in special_characters:
+                    escaped_val = escaped_val.replace(char, "\\" + char)
+                return f"'{escaped_val}'"

Review Comment:
   I cannot find a workaround based on DF apis, not sure whether a new proto message is needed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259403841


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                special_characters = ["\\", "'", '"', "\0", "\b", "\n", "\r", "\t"]

Review Comment:
   also please add comments and pointer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259314468


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                special_characters = ["\\", "'", '"', "\0", "\b", "\n", "\r", "\t"]
+                escaped_val = val
+                for char in special_characters:
+                    escaped_val = escaped_val.replace(char, "\\" + char)
+                return f"'{escaped_val}'"

Review Comment:
   +1, `Literal.sql` seems much more complex https://github.com/apache/spark/blob/1c057f5959c922ff59fd7648ece0dbe9a16aa286/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L476-L541



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259207794


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,10 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                return f"'{val}'"

Review Comment:
   Added test case with updating the code to handle escaped characters more properly. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng closed pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng closed pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect
URL: https://github.com/apache/spark/pull/41931


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259401203


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                special_characters = ["\\", "'", '"', "\0", "\b", "\n", "\r", "\t"]
+                escaped_val = val
+                for char in special_characters:
+                    escaped_val = escaped_val.replace(char, "\\" + char)
+                return f"'{escaped_val}'"

Review Comment:
   Okay, can you leave a comment here that this implementation matches with JVM side?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1260402845


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,13 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                # This is matched to behavior from JVM implementation.
+                # See `sql` definition from `sql/catalyst/src/main/scala/org/apache/spark/
+                # sql/catalyst/expressions/literals.scala`
+                return "'" + val.replace("\\", "\\\\").replace("'", "\\'") + "'"
+            else:
+                return lit(val)._jc.expr().sql()  # for escaped characters.

Review Comment:
   Sounds good



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259367995


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                special_characters = ["\\", "'", '"', "\0", "\b", "\n", "\r", "\t"]
+                escaped_val = val
+                for char in special_characters:
+                    escaped_val = escaped_val.replace(char, "\\" + char)
+                return f"'{escaped_val}'"

Review Comment:
   Yeah, actually I just refer the implementation from JVM to handle the `str` as below:
   
   ```scala
       case (v: UTF8String, StringType) =>
         // Escapes all backslashes and single quotes.
         "'" + v.toString.replace("\\", "\\\\").replace("'", "\\'") + "'"
   ```
   
   Seems like they only care about the single quotation mark, but I additionally add some more possible special characters into `special_characters`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259403299


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                special_characters = ["\\", "'", '"', "\0", "\b", "\n", "\r", "\t"]

Review Comment:
   let's exactly match `"'" + v.toString.replace("\\", "\\\\").replace("'", "\\'") + "'"` in the JVM side



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259720787


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,13 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                # This is matched to behavior from JVM implementation.
+                # See `sql` definition from `sql/catalyst/src/main/scala/org/apache/spark/
+                # sql/catalyst/expressions/literals.scala`
+                return "'" + val.replace("\\", "\\\\").replace("'", "\\'") + "'"
+            else:
+                return lit(val)._jc.expr().sql()  # for escaped characters.

Review Comment:
   ```suggestion
               # This is matched to behavior from JVM implementation.
               # See `sql` definition from `sql/catalyst/src/main/scala/org/apache/spark/
               # sql/catalyst/expressions/literals.scala`
               return "'" + val.replace("\\", "\\\\").replace("'", "\\'") + "'"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259184198


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,10 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                return f"'{val}'"

Review Comment:
   I think there is behavior change:
   
   ```
   In [1]: from pyspark.sql.functions import *
   
   In [2]: val = "a'''c''d"
   
   In [3]: lit(val)._jc.expr().sql()
   Out[3]: "'a\\'\\'\\'c\\'\\'d'"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on PR #41931:
URL: https://github.com/apache/spark/pull/41931#issuecomment-1630110632

   cc @zhengruifeng FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259367995


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                special_characters = ["\\", "'", '"', "\0", "\b", "\n", "\r", "\t"]
+                escaped_val = val
+                for char in special_characters:
+                    escaped_val = escaped_val.replace(char, "\\" + char)
+                return f"'{escaped_val}'"

Review Comment:
   Yeah, actually I just refer the implementation from JVM to handle the `StringType` as below because I belive this part only works when the `val` is a `str`:
   
   ```scala
       case (v: UTF8String, StringType) =>
         // Escapes all backslashes and single quotes.
         "'" + v.toString.replace("\\", "\\\\").replace("'", "\\'") + "'"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259424856


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                special_characters = ["\\", "'", '"', "\0", "\b", "\n", "\r", "\t"]

Review Comment:
   Sure. let me update the comments. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #41931:
URL: https://github.com/apache/spark/pull/41931#discussion_r1259721098


##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -265,7 +266,13 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]:
             val._to_spark().createOrReplaceTempView(df_name)
             return df_name
         elif isinstance(val, str):
-            return lit(val)._jc.expr().sql()  # for escaped characters.
+            if is_remote():
+                # This is matched to behavior from JVM implementation.
+                # See `sql` definition from `sql/catalyst/src/main/scala/org/apache/spark/
+                # sql/catalyst/expressions/literals.scala`
+                return "'" + val.replace("\\", "\\\\").replace("'", "\\'") + "'"
+            else:
+                return lit(val)._jc.expr().sql()  # for escaped characters.

Review Comment:
   Actually let's remove the jvm access here then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org