You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "grundprinzip (via GitHub)" <gi...@apache.org> on 2023/03/15 16:45:34 UTC

[GitHub] [spark] grundprinzip opened a new pull request, #40447: [SPARK-42816] Support Max Message size up to 128MB

grundprinzip opened a new pull request, #40447:
URL: https://github.com/apache/spark/pull/40447

   ### What changes were proposed in this pull request?
   
   This change lifts the default message size of 4MB to 128MB and makes it configurable. While 128MB is a "random number" it supports creating DataFrames from reasonably sized local data without failing.
   
   ### Why are the changes needed?
   Usability
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Manual


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB
URL: https://github.com/apache/spark/pull/40447


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1137914674


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,13 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")

Review Comment:
   +1 for documenting.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1137915076


##########
python/pyspark/sql/connect/client.py:
##########
@@ -122,6 +122,8 @@ class ChannelBuilder:
     PARAM_TOKEN = "token"
     PARAM_USER_ID = "user_id"
     PARAM_USER_AGENT = "user_agent"
+    MAX_MESSAGE_LENGTH = 128 * 1024 * 1024
+    MAX_METADATA_LENGTH = 16 * 1024 * 1024

Review Comment:
   This seems to be unused in this PR. Do we need to add this in this `Support Max Message size up to 128MB` PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1137915076


##########
python/pyspark/sql/connect/client.py:
##########
@@ -122,6 +122,8 @@ class ChannelBuilder:
     PARAM_TOKEN = "token"
     PARAM_USER_ID = "user_id"
     PARAM_USER_AGENT = "user_agent"
+    MAX_MESSAGE_LENGTH = 128 * 1024 * 1024
+    MAX_METADATA_LENGTH = 16 * 1024 * 1024

Review Comment:
   This seems to be unused in this PR. Do we need to add this in this PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1138008293


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,13 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+      .doc("Set the default inbound message size for the GRPC requests.")
+      .version("3.4.0")
+      .bytesConf(ByteUnit.MiB)

Review Comment:
   Should be `ByteUnit.BYTE`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1138007108


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,13 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40447: [SPARK-42816] Support Max Message size up to 128MB

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1137904888


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,13 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")

Review Comment:
   If this is a user-facing conf, should probably document at https://github.com/apache/spark/blob/master/docs/configuration.md#spark-connect



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1138163227


##########
python/pyspark/sql/connect/client.py:
##########
@@ -122,6 +122,8 @@ class ChannelBuilder:
     PARAM_TOKEN = "token"
     PARAM_USER_ID = "user_id"
     PARAM_USER_AGENT = "user_agent"
+    MAX_MESSAGE_LENGTH = 128 * 1024 * 1024
+    MAX_METADATA_LENGTH = 16 * 1024 * 1024

Review Comment:
   No, this is a leftover. Removing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #40447:
URL: https://github.com/apache/spark/pull/40447#issuecomment-1478787974

   Merged to master/branch-3.4. Thank you, @grundprinzip and all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on PR #40447:
URL: https://github.com/apache/spark/pull/40447#issuecomment-1478275510

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] pan3793 commented on a diff in pull request #40447: [SPARK-42816] Support Max Message size up to 128MB

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1137446054


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,13 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+      .doc("Set the default inbound message size for the GRPC requests.")

Review Comment:
   nit: "GRPC" -> "gRPC"
   
   The "default" is misleading, because it's actually a "boundary" or "limitation"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] pan3793 commented on a diff in pull request #40447: [SPARK-42816] Support Max Message size up to 128MB

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1137446054


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,13 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+      .doc("Set the default inbound message size for the GRPC requests.")

Review Comment:
   nit: "GRPC" -> "gRPC"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1142972792


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,16 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+      .doc("""
+          |Sets the maximum inbound message size for the gRPC requests.

Review Comment:
   the unit is in bytes



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,16 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+      .doc("""
+          |Sets the maximum inbound message size for the gRPC requests.
+          |Requests with a larger payload will fail.

Review Comment:
   Fixed.



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,16 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+      .doc("""
+          |Sets the maximum inbound message size for the gRPC requests.
+          |Requests with a larger payload will fail.
+          |"""")

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1143057560


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,16 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+      .doc("""
+          |Sets the maximum inbound message size for the gRPC requests.

Review Comment:
   updated the doc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1138560168


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,16 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+      .doc("""
+          |Sets the maximum inbound message size for the gRPC requests.

Review Comment:
   Could you specify units, for example:
   ```suggestion
             |Sets the maximum inbound message size in megabytes for the gRPC requests.
   ```



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,16 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+      .doc("""
+          |Sets the maximum inbound message size for the gRPC requests.
+          |Requests with a larger payload will fail.

Review Comment:
   I don't think that injecting new lines is nice approach if we are going to format config docs somewhere else.



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,16 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+      .doc("""
+          |Sets the maximum inbound message size for the gRPC requests.
+          |Requests with a larger payload will fail.
+          |"""")

Review Comment:
   ```suggestion
             |""")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #40447:
URL: https://github.com/apache/spark/pull/40447#discussion_r1138167890


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,13 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")

Review Comment:
   Done



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala:
##########
@@ -47,6 +47,13 @@ object Connect {
       .bytesConf(ByteUnit.MiB)
       .createWithDefaultString("4m")
 
+  val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
+    ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+      .doc("Set the default inbound message size for the GRPC requests.")
+      .version("3.4.0")
+      .bytesConf(ByteUnit.MiB)

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org