You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "hvanhovell (via GitHub)" <gi...@apache.org> on 2023/03/22 03:06:45 UTC

[GitHub] [spark] hvanhovell opened a new pull request, #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

hvanhovell opened a new pull request, #40515:
URL: https://github.com/apache/spark/pull/40515

   ### What changes were proposed in this pull request?
   This PR adds Ammonite REPL integration for Spark Connect. This has a couple of benefits:
   - It makes it a lot less cumbersome for users to start a spark connect REPL. You don't have to add custom scripts, and you can use `coursier` to launch a fully function REPL for you.
   - It adds REPL integration for to the actual build. This makes it easier to validate the code we add is actually working.
   
   
   ### Why are the changes needed?
   A REPL is arguably the first entry point for a lot of users.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes it adds REPL integration.
   
   ### How was this patch tested?
   Added tests for the command line parsing. Manually tested the REPL.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #40515:
URL: https://github.com/apache/spark/pull/40515#issuecomment-1480034164

   Ya, I'm not against this nice improvement. Just shoot one email to the dev mailing list to give a headup. That's what I'm thinking that we need.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #40515:
URL: https://github.com/apache/spark/pull/40515#issuecomment-1480397842

   @dongjoon-hyun I send the email.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gerashegalov commented on a diff in pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

Posted by "gerashegalov (via GitHub)" <gi...@apache.org>.
gerashegalov commented on code in PR #40515:
URL: https://github.com/apache/spark/pull/40515#discussion_r1146935415


##########
connector/connect/bin/spark-connect-scala-client:
##########
@@ -46,6 +45,4 @@ build/sbt "${SCALA_ARG}" "sql/package;connect-client-jvm/assembly"
 CONNECT_CLASSPATH="$(build/sbt "${SCALA_ARG}" -DcopyDependencies=false "export connect-client-jvm/fullClasspath" | grep jar | tail -n1)"
 SQL_CLASSPATH="$(build/sbt "${SCALA_ARG}" -DcopyDependencies=false "export sql/fullClasspath" | grep jar | tail -n1)"
 
-INIT_SCRIPT="${SPARK_HOME}"/connector/connect/bin/spark-connect-scala-client.sc
-
-exec "${SCALA_BIN}" -cp "$CONNECT_CLASSPATH:$SQL_CLASSPATH" -i $INIT_SCRIPT
+exec java -cp "$CONNECT_CLASSPATH:$SQL_CLASSPATH" org.apache.spark.sql.application.ConnectRepl "$@"

Review Comment:
   did not notice it earlier, Spark scripts prioritize JAVA_HOME https://github.com/apache/spark/blob/04202b3fdda1ac7aea7d7e5688487ce749ad2f79/bin/spark-class#L27-L36 . Maybe there is a way to leverage this script here to begin with?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #40515:
URL: https://github.com/apache/spark/pull/40515#issuecomment-1480398301

   Thank you so much, @hvanhovell .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40515:
URL: https://github.com/apache/spark/pull/40515#issuecomment-1481296666

   Interesting!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #40515:
URL: https://github.com/apache/spark/pull/40515#issuecomment-1483149265

   Merging this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell closed pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell closed pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration
URL: https://github.com/apache/spark/pull/40515


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #40515:
URL: https://github.com/apache/spark/pull/40515#issuecomment-1480012982

   @dongjoon-hyun officially is a bit a broad term. As far as I am concerned ammonite is just a way to use the connect JVM client, it is not meant as a change for all of Spark (although I do think it might make sense). We use other libraries all the time, and I don't think we have ever discussed those in the mailing list.
   
   As it stands there are two reasons for using ammonite:
   - It is a better and easier to use REPL than the Scala REPL ever was.
   - It is easier to customize. Connect is a good illustration, for the Scala REPL I will probably need to the same thing as we did for the spark REPL project and that is fork the source code. In ammonite this was a minor modification.
   
   I do intent to add support for the regular REPL at some point, this was just easier to get started with.
   
   If you feel strongly about this then I am happy to send a message to the dev-list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gerashegalov commented on a diff in pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

Posted by "gerashegalov (via GitHub)" <gi...@apache.org>.
gerashegalov commented on code in PR #40515:
URL: https://github.com/apache/spark/pull/40515#discussion_r1145546706


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientBuilderParseTestSuite.scala:
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.client
+
+import org.apache.spark.sql.connect.client.util.ConnectFunSuite
+
+/**
+ * Test suite for [[SparkConnectClient.Builder]] parsing and configuration.
+ */
+class SparkConnectClientBuilderParseTestSuite extends ConnectFunSuite {
+  private def build(args: String*): SparkConnectClient.Builder = {
+    SparkConnectClient.builder().parse(args.toArray)
+  }
+
+  private def argumentTest(
+      name: String,
+      value: String,
+      extractor: SparkConnectClient.Builder => String): Unit = {
+    test("Argument - " + name) {
+      val builder = build("--" + name, value)
+      assert(value === extractor(builder))
+      val e = intercept[IllegalArgumentException] {
+        build("--" + name)
+      }
+      assert(e.getMessage.contains("option requires a value"))
+    }
+  }
+
+  argumentTest("host", "www.apache.org", _.host)
+  argumentTest("port", "1506", _.port.toString)
+  argumentTest("token", "azbycxdwev1234567890", _.token.get)
+  argumentTest("user_id", "U1238", _.userId.get)
+  argumentTest("user_name", "alice", _.userName.get)
+  argumentTest("user_agent", "MY APP", _.userAgent)
+
+  test("Argument - remote") {
+    val builder =
+      build("--remote", "sc://srv.apache.org/;user_id=x127;user_name=Q;token=nahnah;param1=x")
+    assert(builder.host == "srv.apache.org")

Review Comment:
   here and elsewhere probably should use `===` consistently  like on L35 for more illustrative exception messages? 



##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala:
##########
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.application
+
+import scala.util.control.NonFatal
+
+import ammonite.compiler.CodeClassWrapper
+import ammonite.util.Bind
+
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.{SparkConnectClient, SparkConnectClientParser}
+
+/**
+ * REPL for spark connect.
+ */
+@DeveloperApi
+object ConnectRepl {
+  private val name = "Spark Connect REPL"
+
+  private val splash =
+    """
+      |Spark session available as 'spark'.
+      |   _____                  __      ______                            __
+      |  / ___/____  ____ ______/ /__   / ____/___  ____  ____  ___  _____/ /_
+      |  \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/
+      | ___/ / /_/ / /_/ / /  / ,<    / /___/ /_/ / / / / / / /  __/ /__/ /_
+      |/____/ .___/\__,_/_/  /_/|_|   \____/\____/_/ /_/_/ /_/\___/\___/\__/
+      |    /_/
+      |""".stripMargin
+
+  def main(args: Array[String]): Unit = {
+    // Build the client.
+    val client =
+      try {
+        SparkConnectClient
+          .builder()
+          .loadFromEnvironment()
+          .userAgent(name)
+          .parse(args)
+          .build()
+      } catch {
+        case NonFatal(e) =>
+          // scalastyle:off println
+          println(s"""
+             |$name
+             |${e.getMessage}
+             |${SparkConnectClientParser.usage()}
+             |""".stripMargin)
+          // scalastyle:on println
+          System.exit(1)
+          return

Review Comment:
   ```suggestion
             sys.exit(1)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #40515:
URL: https://github.com/apache/spark/pull/40515#discussion_r1147863450


##########
connector/connect/bin/spark-connect-scala-client:
##########
@@ -46,6 +45,4 @@ build/sbt "${SCALA_ARG}" "sql/package;connect-client-jvm/assembly"
 CONNECT_CLASSPATH="$(build/sbt "${SCALA_ARG}" -DcopyDependencies=false "export connect-client-jvm/fullClasspath" | grep jar | tail -n1)"
 SQL_CLASSPATH="$(build/sbt "${SCALA_ARG}" -DcopyDependencies=false "export sql/fullClasspath" | grep jar | tail -n1)"
 
-INIT_SCRIPT="${SPARK_HOME}"/connector/connect/bin/spark-connect-scala-client.sc
-
-exec "${SCALA_BIN}" -cp "$CONNECT_CLASSPATH:$SQL_CLASSPATH" -i $INIT_SCRIPT
+exec java -cp "$CONNECT_CLASSPATH:$SQL_CLASSPATH" org.apache.spark.sql.application.ConnectRepl "$@"

Review Comment:
   Sure, let me do that in a follow-up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org