You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhengruifeng (via GitHub)" <gi...@apache.org> on 2023/10/26 04:03:01 UTC

[PR] [SPARK-43923][CONNECT][FOLLOWUP] Correct the message abbreviation [spark]

zhengruifeng opened a new pull request, #43535:
URL: https://github.com/apache/spark/pull/43535

   ### What changes were proposed in this pull request?
   1, truncate raw bytes (udf/udtf/local relation) with `MAX_BYTES_SIZE`;
   2, pass `maxStringSize` to abbreviate nested messages;
   
   ### Why are the changes needed?
   1, there is only one place which specify the `maxStringSize`, with value `MAX_STATEMENT_TEXT_SIZE = 65535`. By its name, it is used to truncate the SQL statements which are always strings. No need to affect raw bytes;
   
   2, according to the implementation of `Message.toString`:
   
   https://github.com/protocolbuffers/protobuf/blob/main/java/core/src/main/java/com/google/protobuf/TextFormat.java#L567-L574
   
   the value of bytes fields can be `ByteString` or `byte[]`, so the two branches should be consistent.
   
   3, `maxStringSize` only affect the top-level string fields, it should also be used in nested messages.
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   ### How was this patch tested?
   ci
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43923][CONNECT][FOLLOWUP] Correct the message abbreviation [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43535:
URL: https://github.com/apache/spark/pull/43535#discussion_r1372887878


##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala:
##########
@@ -54,28 +57,28 @@ private[connect] object ProtoUtils {
           if field.getJavaType == FieldDescriptor.JavaType.BYTE_STRING && byteArray != null =>
         val size = byteArray.size
         if (size > MAX_BYTES_SIZE) {
-          val prefix = byteArray.take(MAX_BYTES_SIZE)
-          builder.setField(field, createByteString(prefix, size))
+          builder.setField(
+            field,
+            ByteString
+              .copyFrom(byteArray, 0, MAX_BYTES_SIZE)

Review Comment:
   Simply from the perspective of method implementation, if the original characters wrapped by ByteString are Unicode in the range of 2048 to 65535, their UTF-8 encoding will be a three-character sequence, and truncation to 8 bytes may result in garbled text.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43923][CONNECT][FOLLOWUP] Correct the message abbreviation [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #43535:
URL: https://github.com/apache/spark/pull/43535#issuecomment-1780380221

   cc @HyukjinKwon @jdesjean 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43923][CONNECT][FOLLOWUP] Correct the message abbreviation [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43535:
URL: https://github.com/apache/spark/pull/43535#issuecomment-1781214368

   Merged into master. Thanks @zhengruifeng @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43923][CONNECT][FOLLOWUP] Correct the message abbreviation [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang closed pull request #43535: [SPARK-43923][CONNECT][FOLLOWUP]  Correct the message abbreviation
URL: https://github.com/apache/spark/pull/43535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43923][CONNECT][FOLLOWUP] Correct the message abbreviation [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #43535:
URL: https://github.com/apache/spark/pull/43535#discussion_r1372929380


##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala:
##########
@@ -54,28 +57,28 @@ private[connect] object ProtoUtils {
           if field.getJavaType == FieldDescriptor.JavaType.BYTE_STRING && byteArray != null =>
         val size = byteArray.size
         if (size > MAX_BYTES_SIZE) {
-          val prefix = byteArray.take(MAX_BYTES_SIZE)
-          builder.setField(field, createByteString(prefix, size))
+          builder.setField(
+            field,
+            ByteString
+              .copyFrom(byteArray, 0, MAX_BYTES_SIZE)

Review Comment:
   It always show the hex string in UI, don't need to take specified charset into account



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43923][CONNECT][FOLLOWUP] Correct the message abbreviation [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43535:
URL: https://github.com/apache/spark/pull/43535#discussion_r1373247424


##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala:
##########
@@ -54,28 +57,28 @@ private[connect] object ProtoUtils {
           if field.getJavaType == FieldDescriptor.JavaType.BYTE_STRING && byteArray != null =>
         val size = byteArray.size
         if (size > MAX_BYTES_SIZE) {
-          val prefix = byteArray.take(MAX_BYTES_SIZE)
-          builder.setField(field, createByteString(prefix, size))
+          builder.setField(
+            field,
+            ByteString
+              .copyFrom(byteArray, 0, MAX_BYTES_SIZE)

Review Comment:
   Got it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org