You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/02/01 01:19:21 UTC

[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20424#discussion_r165236983
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
    @@ -191,7 +191,20 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String
             daemon = pb.start()
     
             val in = new DataInputStream(daemon.getInputStream)
    -        daemonPort = in.readInt()
    +        try {
    +          daemonPort = in.readInt()
    +        } catch {
    +          case exc: EOFException =>
    +            throw new IOException(s"No port number in $daemonModule's stdout")
    +        }
    +
    +        // test that the returned port number is within a valid range.
    +        // note: this does not cover the case where the port number
    +        // is arbitrary data but is also coincidentally within range
    +        if (daemonPort < 1 || daemonPort > 0xffff) {
    --- End diff --
    
    Thanks, @squito. My only worry is, what is the error message like befer / after this change when `daemonPort` happens to be 0 because it was filled in pyspark.daemon's stdout unexpectedly by the sitecustomize or usercustomize whatever?
    
    I assume the error message becomes different as you described above because `java.net.InetSocketAddress.checkPort` passes `0` case but your proposed change also catches `0` case too for the reason you described above.
    
    This case `0` is different because the previous codes probably try to attempt a connection whether it's a valid port or not, after passing `java.net.InetSocketAddress.checkPort`, if I didn't misunderstand.
    
    I want to be very sure on this. Is it guranteed to be always failed without any hole and the error message is always better in this case? How about other ports reserved or "well-known" ? Should we really take this into account here?
    
    Please let me know if I missed something.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org