You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sam Steingold (JIRA)" <ji...@apache.org> on 2015/05/28 15:38:28 UTC
[jira] [Updated] (SPARK-7898) pyspark merges stderr into stdout
[ https://issues.apache.org/jira/browse/SPARK-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sam Steingold updated SPARK-7898:
---------------------------------
Description:
When I type
{code}
hadoop fs -text /foo/bar/baz.bz2 2>err 1>out
{code}
I get two non-empty files: {{err}} with
{code}
2015-05-26 15:33:49,786 INFO [main] bzip2.Bzip2Factory (Bzip2Factory.java:isNativeBzip2Loaded(70)) - Successfully loaded & initialized native-bzip2 library system-native
2015-05-26 15:33:49,789 INFO [main] compress.CodecPool (CodecPool.java:getDecompressor(179)) - Got brand-new decompressor [.bz2]
{code}
and {{out}} with the content of the file (as expected).
When I call the same command from Python (2.6):
{code}
from subprocess import Popen
with open("out","w") as out:
with open("err","w") as err:
p = Popen(['hadoop','fs','-text',"/foo/bar/baz.bz2"],
stdin=None,stdout=out,stderr=err)
print p.wait()
{code}
I get the exact same (correct) behavior.
*However*, when I run the same code under *PySpark* (or using {{spark-submit}}), I get an *empty* {{err}} file and the {{out}} file starts with the log messages above (and then it contains the actual data).
was:
When I type
{code}
hadoop fs -text /foo/bar/baz.bz2 2>err 1>out
{code}
I get two non-empty files: {{err}} with
{code}
2015-05-26 15:33:49,786 INFO [main] bzip2.Bzip2Factory (Bzip2Factory.java:isNativeBzip2Loaded(70)) - Successfully loaded & initialized native-bzip2 library system-native
2015-05-26 15:33:49,789 INFO [main] compress.CodecPool (CodecPool.java:getDecompressor(179)) - Got brand-new decompressor [.bz2]
{code}
and {{out}} with the content of the file (as expected).
When I call the same command from Python (2.6):
{code}
from subprocess import Popen
with open("out","w") as out:
with open("err","w") as err:
p = Popen(['hadoop','fs','-text',"/foo/bar/baz.bz2"],
stdin=None,stdout=out,stderr=err)
print p.wait()
{code}
I get the exact same (correct) behavior.
*However*, when I run the same code under *PySpark* (or using `spark-submit`), I get an *empty* {{err}} file and the {{out}} file starts with the log messages above (and then it contains the actual data).
> pyspark merges stderr into stdout
> ---------------------------------
>
> Key: SPARK-7898
> URL: https://issues.apache.org/jira/browse/SPARK-7898
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.3.0
> Reporter: Sam Steingold
>
> When I type
> {code}
> hadoop fs -text /foo/bar/baz.bz2 2>err 1>out
> {code}
> I get two non-empty files: {{err}} with
> {code}
> 2015-05-26 15:33:49,786 INFO [main] bzip2.Bzip2Factory (Bzip2Factory.java:isNativeBzip2Loaded(70)) - Successfully loaded & initialized native-bzip2 library system-native
> 2015-05-26 15:33:49,789 INFO [main] compress.CodecPool (CodecPool.java:getDecompressor(179)) - Got brand-new decompressor [.bz2]
> {code}
> and {{out}} with the content of the file (as expected).
> When I call the same command from Python (2.6):
> {code}
> from subprocess import Popen
> with open("out","w") as out:
> with open("err","w") as err:
> p = Popen(['hadoop','fs','-text',"/foo/bar/baz.bz2"],
> stdin=None,stdout=out,stderr=err)
> print p.wait()
> {code}
> I get the exact same (correct) behavior.
> *However*, when I run the same code under *PySpark* (or using {{spark-submit}}), I get an *empty* {{err}} file and the {{out}} file starts with the log messages above (and then it contains the actual data).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org