You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Bolke de Bruin (JIRA)" <ji...@apache.org> on 2017/07/14 10:22:00 UTC

[jira] [Resolved] (AIRFLOW-1255) SparkSubmitOperator logs do not stream correctly

     [ https://issues.apache.org/jira/browse/AIRFLOW-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bolke de Bruin resolved AIRFLOW-1255.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.8.3

Issue resolved by pull request #2438
[https://github.com/apache/incubator-airflow/pull/2438]

> SparkSubmitOperator logs do not stream correctly
> ------------------------------------------------
>
>                 Key: AIRFLOW-1255
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1255
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks, operators
>    Affects Versions: Airflow 1.8
>         Environment: Spark 1.6.0 with Yarn cluster
> Airflow 1.8
>            Reporter: Himanshu Jain
>            Priority: Minor
>              Labels: easyfix
>             Fix For: 1.8.3
>
>
> Logging in SparkSubmitOperator does not work as intended (continuous logging as received in the subprocess). This is because, spark-submit internally redirects all logs to stdout (including stderr), which causes the current two iterator logging to get stuck with empty stderr pipe. The logs are written only when the subprocess finishes. This leads to yarn_application_id not being available until the end of application.
>  Specifically,
> {code:title= spark_submit_hook.py (lines 217-220)|borderStyle=solid}
> self._sp = subprocess.Popen(spark_submit_cmd,
>                                 stdout=subprocess.PIPE,
>                                 stderr=subprocess.PIPE,
>                                 **kwargs)
> {code}
> needs to be changed to 
> {code:title= spark_submit_hook.py|borderStyle=solid}
> self._sp = subprocess.Popen(spark_submit_cmd,
>                                 stdout=subprocess.PIPE,
>                                 **kwargs)
> {code}
> with subsequent changes in the following lines.
> I have not tested whether the issue exists with spark 2 versions as well or not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)