You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/04/28 11:21:13 UTC

[jira] [Resolved] (SPARK-7898) pyspark merges stderr into stdout

     [ https://issues.apache.org/jira/browse/SPARK-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-7898.
------------------------------
    Resolution: Not A Problem

I think this is by design. You're saying that user code's output is all shunted to stdout, because Pyspark itself is using stderr for its own output that isn't user program output. I think that's sensible.

You would never want to rely on this behavior for your program. If you need to use the output of a binary, use a piped RDD or similar.

> pyspark merges stderr into stdout
> ---------------------------------
>
>                 Key: SPARK-7898
>                 URL: https://issues.apache.org/jira/browse/SPARK-7898
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.3.0
>            Reporter: Sam Steingold
>
> When I type 
> {code}
> hadoop fs -text /foo/bar/baz.bz2 2>err 1>out
> {code}
> I get two non-empty files: {{err}} with 
> {code}
> 2015-05-26 15:33:49,786 INFO  [main] bzip2.Bzip2Factory (Bzip2Factory.java:isNativeBzip2Loaded(70)) - Successfully loaded & initialized native-bzip2 library system-native
> 2015-05-26 15:33:49,789 INFO  [main] compress.CodecPool (CodecPool.java:getDecompressor(179)) - Got brand-new decompressor [.bz2]
> {code}
> and {{out}} with the content of the file (as expected).
> When I call the same command from Python (2.6):
> {code}
> from subprocess import Popen
> with open("out","w") as out:
>     with open("err","w") as err:
>         p = Popen(['hadoop','fs','-text',"/foo/bar/baz.bz2"],
>                   stdin=None,stdout=out,stderr=err)
> print p.wait()
> {code}
> I get the exact same (correct) behavior.
> *However*, when I run the same code under *PySpark* (or using {{spark-submit}}), I get an *empty* {{err}} file and the {{out}} file starts with the log messages above (and then it contains the actual data).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org