You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bruce Robbins (JIRA)" <ji...@apache.org> on 2018/01/26 21:17:00 UTC

[jira] [Commented] (SPARK-23240) PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout

    [ https://issues.apache.org/jira/browse/SPARK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341609#comment-16341609 ] 

Bruce Robbins commented on SPARK-23240:
---------------------------------------

I will be making a pull request.

> PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-23240
>                 URL: https://issues.apache.org/jira/browse/SPARK-23240
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.1
>            Reporter: Bruce Robbins
>            Priority: Minor
>
> Environmental issues or site-local customizations (i.e., sitecustomize.py present in the python install directory) can interfere with daemon.py’s output to stdout. PythonWorkerFactory produces unhelpful messages when this happens, causing some head scratching before the actual issue is determined.
> Case #1: Extraneous data in pyspark.daemon’s stdout. In this case, PythonWorkerFactory uses the output as the daemon’s port number and ends up throwing an exception when creating the socket:
> {noformat}
> java.lang.IllegalArgumentException: port out of range:1819239265
> 	at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
> 	at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
> 	at java.net.Socket.<init>(Socket.java:244)
> 	at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:78)
> {noformat}
> Case #2: No data in pyspark.daemon’s stdout. In this case, PythonWorkerFactory throws an EOFException exception reading the from the Process input stream.
> The second case is somewhat less mysterious than the first, because PythonWorkerFactory also displays the stderr from the python process.
> When there is unexpected or missing output in pyspark.daemon’s stdout, PythonWorkerFactory should say so.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org