You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Ash Berlin-Taylor (JIRA)" <ji...@apache.org> on 2019/04/12 10:29:00 UTC

[jira] [Updated] (AIRFLOW-4289) spark_binary argument in SparkSubmitHook is ignored when building the connection_cmd

     [ https://issues.apache.org/jira/browse/AIRFLOW-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ash Berlin-Taylor updated AIRFLOW-4289:
---------------------------------------
    Fix Version/s: 1.10.4

> spark_binary argument in SparkSubmitHook is ignored when building the connection_cmd
> ------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-4289
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4289
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: contrib, hooks
>    Affects Versions: 1.10.3
>            Reporter: Luiz Svoboda
>            Priority: Minor
>              Labels: usability
>             Fix For: 1.10.4
>
>
> When using the SparkSubmitOperator, although it is possible to specify the parameter _spark_binary_, its value is ignored during the creation of the _connection_cmd_. Instead, the value used for this property is extracted from the connection parameters, or it defaults to _spark-submit_ as can be seen in [spark_submit_hook line:190|https://github.com/apache/airflow/blob/1.10.3/airflow/contrib/hooks/spark_submit_hook.py#L190]
> Actually, this configuration is a bit confusing as the user can configure it via _connection_ or directly when creating the operator instance. I suggest keeping only one option, and in this case, [IMHO] the connection approach seems to be better as it is already used to configure some other options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)