You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "HyukjinKwon (via GitHub)" <gi...@apache.org> on 2023/03/14 11:20:36 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request, #40416: [SPARK-42731][CONNECT][DOCS] Document Spark Connect configurations

HyukjinKwon opened a new pull request, #40416:
URL: https://github.com/apache/spark/pull/40416

   ### What changes were proposed in this pull request?
   
   This PR proposes to document the configuration of Spark Connect defined in https://github.com/apache/spark/blob/master/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala
   
   ### Why are the changes needed?
   
   To let users know which configuration are supported for Spark Connect.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it documents the configurations for Spark Connect.
   
   ### How was this patch tested?
   
   Linters in CI should verify this change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #40416: [SPARK-42731][CONNECT][DOCS] Document Spark Connect configurations

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #40416:
URL: https://github.com/apache/spark/pull/40416#issuecomment-1469009905

   Merged to master and branch-3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40416: [SPARK-42731][CONNECT][DOCS] Document Spark Connect configurations

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #40416:
URL: https://github.com/apache/spark/pull/40416#discussion_r1136357256


##########
docs/configuration.md:
##########
@@ -3138,6 +3138,69 @@ like shuffle, just replace "rpc" with "shuffle" in the property names except
 The default value for number of thread-related config keys is the minimum of the number of cores requested for 
 the driver or executor, or, in the absence of that value, the number of cores available for the JVM (with a hardcoded upper limit of 8).
 
+### Spark Connect
+
+#### Server Configuration
+
+Server configurations are set in Spark Connect server, for example, when you start the Spark Connect server with `./sbin/start-connect-server.sh`.
+They are typically set via the config file and command-lineoptions with `--conf/-c`.
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
+<tr>
+  <td><code>spark.connect.grpc.binding.port</code></td>
+  <td>
+    15002
+  </td>
+  <td>Port for Spark Connect server to bind.</td>
+  <td>3.4.0</td>
+</tr>
+<tr>
+  <td><code>spark.connect.grpc.interceptor.classes</code></td>
+  <td>
+    (none)
+  </td>
+  <td>Comma separated list of class names that must implement the <code>io.grpc.ServerInterceptor</code> interface</td>
+  <td>3.4.0</td>
+</tr>
+<tr>
+  <td><code>spark.connect.grpc.arrow.maxBatchSize</code></td>
+  <td>
+    4m
+  </td>
+  <td>When using Apache Arrow, limit the maximum size of one arrow batch that can be sent from server side to client side. Currently, we conservatively use 70% of it because the size is not accurate but estimated.</td>

Review Comment:
   This actually matches with the docs in https://github.com/apache/spark/blob/master/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala. I think we should better do that separately; otherwise it will require a full build because of the code change.
   
   Please feel free to make a minor PR if you're interested in this :-).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bjornjorgensen commented on a diff in pull request #40416: [SPARK-42731][CONNECT][DOCS] Document Spark Connect configurations

Posted by "bjornjorgensen (via GitHub)" <gi...@apache.org>.
bjornjorgensen commented on code in PR #40416:
URL: https://github.com/apache/spark/pull/40416#discussion_r1135455222


##########
docs/configuration.md:
##########
@@ -3138,6 +3138,69 @@ like shuffle, just replace "rpc" with "shuffle" in the property names except
 The default value for number of thread-related config keys is the minimum of the number of cores requested for 
 the driver or executor, or, in the absence of that value, the number of cores available for the JVM (with a hardcoded upper limit of 8).
 
+### Spark Connect
+
+#### Server Configuration
+
+Server configurations are set in Spark Connect server, for example, when you start the Spark Connect server with `./sbin/start-connect-server.sh`.
+They are typically set via the config file and command-lineoptions with `--conf/-c`.
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
+<tr>
+  <td><code>spark.connect.grpc.binding.port</code></td>
+  <td>
+    15002
+  </td>
+  <td>Port for Spark Connect server to bind.</td>
+  <td>3.4.0</td>
+</tr>
+<tr>
+  <td><code>spark.connect.grpc.interceptor.classes</code></td>
+  <td>
+    (none)
+  </td>
+  <td>Comma separated list of class names that must implement the <code>io.grpc.ServerInterceptor</code> interface</td>
+  <td>3.4.0</td>
+</tr>
+<tr>
+  <td><code>spark.connect.grpc.arrow.maxBatchSize</code></td>
+  <td>
+    4m
+  </td>
+  <td>When using Apache Arrow, limit the maximum size of one arrow batch that can be sent from server side to client side. Currently, we conservatively use 70% of it because the size is not accurate but estimated.</td>

Review Comment:
   limit the maximum size of one arrow batch that can be sent from server side to client side. Currently, we conservatively use 70% of it because the size is not accurate but estimated.
     
   -> 
   limit the maximum size of each arrow batch sent from the server to the client. Currently, we are using a conservative estimate of 70% of the maximum size, since the actual size cannot be accurately determined.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #40416: [SPARK-42731][CONNECT][DOCS] Document Spark Connect configurations

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #40416: [SPARK-42731][CONNECT][DOCS] Document Spark Connect configurations
URL: https://github.com/apache/spark/pull/40416


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org