You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "dillitz (via GitHub)" <gi...@apache.org> on 2023/10/10 16:01:50 UTC

[PR] [SPARK-45486][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes [spark]

dillitz opened a new pull request, #43313:
URL: https://github.com/apache/spark/pull/43313

   ### What changes were proposed in this pull request?
   With this PR similar to the[ Python client](https://github.com/apache/spark/blob/2cc1ee4d3a05a641d7a245f015ef824d8f7bae8b/python/pyspark/sql/connect/client/core.py#L284) the Scala client's user agent now:
   
   1. Uses the SPARK_CONNECT_USER_AGENT environment variable if set
   2. Includes the OS, JVM version, Scala version, and Spark version
   
   ### Why are the changes needed?
   Feature parity with the Python client. Better observability of Scala Spark Connect clients.
   
   
   ### Does this PR introduce _any_ user-facing change?
   By default, the user agent string now contains more useful information.
   Before:
   `_SPARK_CONNECT_SCALA`
   After:
   `_SPARK_CONNECT_SCALA spark/4.0.0-SNAPSHOT scala/2.13.12 jvm/17.0.8.1 os/darwin`
   
   
   ### How was this patch tested?
   Tests added & adjusted.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45485][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43313:
URL: https://github.com/apache/spark/pull/43313#discussion_r1360283998


##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala:
##########
@@ -585,6 +587,27 @@ object SparkConnectClient {
     def build(): SparkConnectClient = _configuration.toSparkConnectClient
   }
 
+  /**
+   * Appends the Spark, Scala & JVM version, and the used OS to the user-provided user agent.
+   */
+  private def genUserAgent(value: String): String = {
+    val scalaVersion = Properties.versionNumberString
+    val jvmVersion = System.getProperty("java.version").split("_")(0)
+    val osName = {
+      val os = System.getProperty("os.name").toLowerCase(Locale.ROOT)
+      if (os.contains("mac")) "darwin"
+      else if (os.contains("linux")) "linux"
+      else if (os.contains("win")) "windows"
+      else "unknown"
+    }
+    List(
+      value,
+      s"spark/$SPARK_VERSION",
+      s"scala/$scalaVersion",
+      s"jvm/$jvmVersion",
+      s"os/$osName").mkString(" ")

Review Comment:
   Hm, might not be a big deal but wonder if this is compatible with Python's `platform.uname().system.lower()`. At least, my Mac shows a different one `darwin` in Python, and `mac os x` in here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45485][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43313:
URL: https://github.com/apache/spark/pull/43313#issuecomment-1763935656

   My bad, there was a PR merged with a wrong PR ticket. I switched the ticket each other.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45485][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43313:
URL: https://github.com/apache/spark/pull/43313#issuecomment-1766136291

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45485][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43313:
URL: https://github.com/apache/spark/pull/43313#discussion_r1360284557


##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala:
##########
@@ -585,6 +587,27 @@ object SparkConnectClient {
     def build(): SparkConnectClient = _configuration.toSparkConnectClient
   }
 
+  /**
+   * Appends the Spark, Scala & JVM version, and the used OS to the user-provided user agent.
+   */
+  private def genUserAgent(value: String): String = {
+    val scalaVersion = Properties.versionNumberString
+    val jvmVersion = System.getProperty("java.version").split("_")(0)
+    val osName = {
+      val os = System.getProperty("os.name").toLowerCase(Locale.ROOT)
+      if (os.contains("mac")) "darwin"
+      else if (os.contains("linux")) "linux"
+      else if (os.contains("win")) "windows"
+      else "unknown"
+    }
+    List(
+      value,
+      s"spark/$SPARK_VERSION",
+      s"scala/$scalaVersion",
+      s"jvm/$jvmVersion",
+      s"os/$osName").mkString(" ")

Review Comment:
   Might not be a big deal but would be great to keep them same if it's easy to make it consistent



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45485][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43313:
URL: https://github.com/apache/spark/pull/43313#discussion_r1360289233


##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala:
##########
@@ -585,6 +587,27 @@ object SparkConnectClient {
     def build(): SparkConnectClient = _configuration.toSparkConnectClient
   }
 
+  /**
+   * Appends the Spark, Scala & JVM version, and the used OS to the user-provided user agent.
+   */
+  private def genUserAgent(value: String): String = {
+    val scalaVersion = Properties.versionNumberString
+    val jvmVersion = System.getProperty("java.version").split("_")(0)
+    val osName = {
+      val os = System.getProperty("os.name").toLowerCase(Locale.ROOT)
+      if (os.contains("mac")) "darwin"
+      else if (os.contains("linux")) "linux"
+      else if (os.contains("win")) "windows"
+      else "unknown"
+    }
+    List(
+      value,
+      s"spark/$SPARK_VERSION",
+      s"scala/$scalaVersion",
+      s"jvm/$jvmVersion",
+      s"os/$osName").mkString(" ")

Review Comment:
   BTW, I think we should have used other ser/de format like JSON ... other values here can easily contain whitespaces cc @nija-at 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45485][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43313:
URL: https://github.com/apache/spark/pull/43313#discussion_r1360285771


##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala:
##########
@@ -585,6 +587,27 @@ object SparkConnectClient {
     def build(): SparkConnectClient = _configuration.toSparkConnectClient
   }
 
+  /**
+   * Appends the Spark, Scala & JVM version, and the used OS to the user-provided user agent.
+   */
+  private def genUserAgent(value: String): String = {
+    val scalaVersion = Properties.versionNumberString
+    val jvmVersion = System.getProperty("java.version").split("_")(0)
+    val osName = {
+      val os = System.getProperty("os.name").toLowerCase(Locale.ROOT)
+      if (os.contains("mac")) "darwin"
+      else if (os.contains("linux")) "linux"
+      else if (os.contains("win")) "windows"
+      else "unknown"
+    }
+    List(
+      value,
+      s"spark/$SPARK_VERSION",
+      s"scala/$scalaVersion",
+      s"jvm/$jvmVersion",
+      s"os/$osName").mkString(" ")

Review Comment:
   Ah, okie. you're doing it above 598L



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45485][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes [spark]

Posted by "dillitz (via GitHub)" <gi...@apache.org>.
dillitz commented on code in PR #43313:
URL: https://github.com/apache/spark/pull/43313#discussion_r1360307549


##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala:
##########
@@ -585,6 +587,27 @@ object SparkConnectClient {
     def build(): SparkConnectClient = _configuration.toSparkConnectClient
   }
 
+  /**
+   * Appends the Spark, Scala & JVM version, and the used OS to the user-provided user agent.
+   */
+  private def genUserAgent(value: String): String = {
+    val scalaVersion = Properties.versionNumberString
+    val jvmVersion = System.getProperty("java.version").split("_")(0)
+    val osName = {
+      val os = System.getProperty("os.name").toLowerCase(Locale.ROOT)
+      if (os.contains("mac")) "darwin"
+      else if (os.contains("linux")) "linux"
+      else if (os.contains("win")) "windows"
+      else "unknown"
+    }
+    List(
+      value,
+      s"spark/$SPARK_VERSION",
+      s"scala/$scalaVersion",
+      s"jvm/$jvmVersion",
+      s"os/$osName").mkString(" ")

Review Comment:
   OS should be consistent for Mac, Windows, and Linux. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45485][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43313: [SPARK-45485][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes
URL: https://github.com/apache/spark/pull/43313


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45485][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43313:
URL: https://github.com/apache/spark/pull/43313#issuecomment-1766136029

   Lemme merge this as it seems pretty consistent with the exiting codebase.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45485][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes [spark]

Posted by "dillitz (via GitHub)" <gi...@apache.org>.
dillitz commented on PR #43313:
URL: https://github.com/apache/spark/pull/43313#issuecomment-1766057705

   FYI @grundprinzip 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org