You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/19 10:56:26 UTC

[GitHub] [spark] HyukjinKwon edited a comment on pull request #30389: [SPARK-33143][PYTHON] Add configurable timeout to python server and client

HyukjinKwon edited a comment on pull request #30389:
URL: https://github.com/apache/spark/pull/30389#issuecomment-730281020


   @gaborgsomogyi, I locally tried some other standard ways such as passing an argument properly and the changes become sort of big and invasive. What do you think about just pass the value via environment in the SparkContext for now?
   
   ```diff
   diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
   index 527d0d6d3a4..33849f6fcb6 100644
   --- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
   +++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
   @@ -85,4 +85,8 @@ private[spark] object PythonUtils {
      def getBroadcastThreshold(sc: JavaSparkContext): Long = {
        sc.conf.get(org.apache.spark.internal.config.BROADCAST_FOR_UDF_COMPRESSION_THRESHOLD)
      }
   +
   +  def getPythonAuthSocketTimeout(sc: JavaSparkContext): Long = {
   +    sc.conf.get(org.apache.spark.internal.config.Python.PYTHON_AUTH_SOCKET_TIMEOUT)
   +  }
    }
   diff --git a/python/pyspark/context.py b/python/pyspark/context.py
   index 9c9e3f4b3c8..8956e163000 100644
   --- a/python/pyspark/context.py
   +++ b/python/pyspark/context.py
   @@ -222,6 +222,7 @@ class SparkContext(object):
            # data via a socket.
            # scala's mangled names w/ $ in them require special treatment.
            self._encryption_enabled = self._jvm.PythonUtils.isEncryptionEnabled(self._jsc)
   +        os.environ["SPARK_AUTH_SOCKET_TIMEOUT"] = self._jvm.PythonUtils.getPythonAuthSocketTimeout(self._jsc)
   
            self.pythonExec = os.environ.get("PYSPARK_PYTHON", 'python')
            self.pythonVer = "%d.%d" % sys.version_info[:2]
   ```
   
   This auth will be called only for RDD, DataFrame APIs such as `collect` and broadcasting in driver side. So, I think it's safe to assume there's always SparkContext running when we need to know the timeout.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org