You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kengo Seki (Jira)" <ji...@apache.org> on 2020/03/10 01:14:00 UTC
[jira] [Created] (AIRFLOW-7026) Improve SparkSqlHook's error
message
Kengo Seki created AIRFLOW-7026:
-----------------------------------
Summary: Improve SparkSqlHook's error message
Key: AIRFLOW-7026
URL: https://issues.apache.org/jira/browse/AIRFLOW-7026
Project: Apache Airflow
Issue Type: Improvement
Components: hooks
Affects Versions: 1.10.9
Reporter: Kengo Seki
Assignee: Kengo Seki
If {{SparkSqlHook.run_query()}} fails, it raises the following exception.
{code}
if returncode:
raise AirflowException(
"Cannot execute {} on {}. Process exit code: {}.".format(
cmd, self._conn.host, returncode
)
)
{code}
But this message is not so useful actually. For example:
{code}
In [1]: from airflow.providers.apache.spark.operators.spark_sql import SparkSqlOperator
In [2]: SparkSqlOperator(sql="SELECT * FROM NON_EXISTENT_TABLE", master="local[*]", conn_id="spark_default", task_id="_").execute(None)
(snip)
---------------------------------------------------------------------------
AirflowException Traceback (most recent call last)
<ipython-input-2-d69c4454e999> in <module>
----> 1 SparkSqlOperator(sql="SELECT * FROM NON_EXISTENT_TABLE", master="local[*]", conn_id="spark_default", task_id="_").execute(None)
~/repos/incubator-airflow/airflow/providers/apache/spark/operators/spark_sql.py in execute(self, context)
105 yarn_queue=self._yarn_queue
106 )
--> 107 self._hook.run_query()
108
109 def on_kill(self):
~/repos/incubator-airflow/airflow/providers/apache/spark/hooks/spark_sql.py in run_query(self, cmd, **kwargs)
154 raise AirflowException(
155 "Cannot execute {} on {}. Process exit code: {}.".format(
--> 156 cmd, self._conn.host, returncode
157 )
158 )
AirflowException: Cannot execute on yarn. Process exit code: 1.
{code}
Most users will expect the executed query is shown as the first argument for the exception and the "master" value (i.e., "local[*]" here) as the second, but meaningless information (an empty string and "yarn") is shown instead.
The reason are as follows:
* The executed query is specified by the "sql" parameter for the {{SparkSqlHook.\_\_init__}} method, not by {{cmd}}.
* The "master" value is specified by the "master" parameter for the {{SparkSqlHook.\_\_init__}} method, not by {{self._conn.host}}. Actually, {{self._conn}} is not used at all in SparkSqlHook.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)