You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "nchammas (via GitHub)" <gi...@apache.org> on 2024/01/21 21:52:34 UTC

Re: [PR] [SPARK-44867][CONNECT][DOCS] Refactor Spark Connect Docs to incorporate Scala setup [spark]

nchammas commented on code in PR #42556:
URL: https://github.com/apache/spark/pull/42556#discussion_r1461157963


##########
docs/spark-connect-overview.md:
##########
@@ -219,29 +195,175 @@ Now you can run PySpark code in the shell to see Spark Connect in action:
 |  2|Maria|
 +---+-----+
 {% endhighlight %}
+
 </div>
 
+<div data-lang="scala"  markdown="1">
+For the Scala shell, we use an Ammonite-based REPL that is currently not included in the Apache Spark package.
+
+To set up the new Scala shell, first download and install [Coursier CLI](https://get-coursier.io/docs/cli-installation).
+Then, install the REPL using the following command in a terminal window:
+{% highlight bash %}
+cs install –-contrib spark-connect-repl
+{% endhighlight %}
+
+And now you can start the Ammonite-based Scala REPL/shell to connect to your Spark server like this:
+
+{% highlight bash %}
+spark-connect-repl
+{% endhighlight %}
+
+A greeting message will appear when the REPL successfully initializes:
+{% highlight bash %}
+Spark session available as 'spark'.
+   _____                  __      ______                            __
+  / ___/____  ____ ______/ /__   / ____/___  ____  ____  ___  _____/ /_
+  \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/
+ ___/ / /_/ / /_/ / /  / ,<    / /___/ /_/ / / / / / / /  __/ /__/ /_
+/____/ .___/\__,_/_/  /_/|_|   \____/\____/_/ /_/_/ /_/\___/\___/\__/
+    /_/
+{% endhighlight %}
+
+By default, the REPL will attempt to connect to a local Spark Server.
+Run the following Scala code in the shell to see Spark Connect in action:
+
+{% highlight scala %}
+@ spark.range(10).count
+res0: Long = 10L
+{% endhighlight %}
+
+### Configure client-server connection
+
+By default, the REPL will attempt to connect to a local Spark Server on port 15002.
+The connection, however, may be configured in several ways as described in this configuration
+[reference](https://github.com/apache/spark/blob/master/connector/connect/docs/client-connection-string.md).
+
+#### Set SPARK_REMOTE environment variable
+
+The SPARK_REMOTE environment variable can be set on the client machine to customize the client-server
+connection that is initialized at REPL startup.
+
+{% highlight bash %}
+export SPARK_REMOTE="sc://myhost.com:443/;token=ABCDEFG"
+spark-connect-repl
+{% endhighlight %}
+or
+{% highlight bash %}
+SPARK_REMOTE="sc://myhost.com:443/;token=ABCDEFG" spark-connect-repl
+{% endhighlight %}
+
+#### Use CLI arguments
+
+The customizations may also be passed in through CLI arguments as shown below:
+{% highlight bash %}
+spark-connect-repl --host myhost.com --port 443 --token ABCDEFG
+{% endhighlight %}
+
+The supported list of CLI arguments may be found [here](https://github.com/apache/spark/blob/master/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClientParser.scala#L48).

Review Comment:
   This link is now broken. What should it be updated to?
   
   In the future please always link to a specific commit (rather than `master`) so that file and line links don't break as the code changes. (Tip: If you hit `y` in your browser, GitHub will convert the URL of the file you are looking at to a fixed commit.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org