You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kengo Seki (Jira)" <ji...@apache.org> on 2020/03/10 01:14:00 UTC

[jira] [Created] (AIRFLOW-7026) Improve SparkSqlHook's error message

Kengo Seki created AIRFLOW-7026:
-----------------------------------

             Summary: Improve SparkSqlHook's error message
                 Key: AIRFLOW-7026
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-7026
             Project: Apache Airflow
          Issue Type: Improvement
          Components: hooks
    Affects Versions: 1.10.9
            Reporter: Kengo Seki
            Assignee: Kengo Seki


If {{SparkSqlHook.run_query()}} fails, it raises the following exception.

{code}
        if returncode:
            raise AirflowException(
                "Cannot execute {} on {}. Process exit code: {}.".format(
                    cmd, self._conn.host, returncode
                )
            )
{code}

But this message is not so useful actually. For example:

{code}
In [1]: from airflow.providers.apache.spark.operators.spark_sql import SparkSqlOperator                                                                                      

In [2]: SparkSqlOperator(sql="SELECT * FROM NON_EXISTENT_TABLE", master="local[*]", conn_id="spark_default", task_id="_").execute(None)                                      

(snip)

---------------------------------------------------------------------------
AirflowException                          Traceback (most recent call last)
<ipython-input-2-d69c4454e999> in <module>
----> 1 SparkSqlOperator(sql="SELECT * FROM NON_EXISTENT_TABLE", master="local[*]", conn_id="spark_default", task_id="_").execute(None)

~/repos/incubator-airflow/airflow/providers/apache/spark/operators/spark_sql.py in execute(self, context)
    105                                   yarn_queue=self._yarn_queue
    106                                   )
--> 107         self._hook.run_query()
    108 
    109     def on_kill(self):

~/repos/incubator-airflow/airflow/providers/apache/spark/hooks/spark_sql.py in run_query(self, cmd, **kwargs)
    154             raise AirflowException(
    155                 "Cannot execute {} on {}. Process exit code: {}.".format(
--> 156                     cmd, self._conn.host, returncode
    157                 )
    158             )

AirflowException: Cannot execute  on yarn. Process exit code: 1.
{code}

Most users will expect the executed query is shown as the first argument for the exception and the "master" value (i.e., "local[*]" here) as the second, but meaningless information (an empty string and "yarn") is shown instead.
The reason are as follows:

* The executed query is specified by the "sql" parameter for the {{SparkSqlHook.\_\_init__}} method, not by {{cmd}}. 
* The "master" value is specified by the "master" parameter for the {{SparkSqlHook.\_\_init__}} method, not by {{self._conn.host}}. Actually, {{self._conn}} is not used at all in SparkSqlHook.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)