You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/03/17 14:29:00 UTC
[jira] [Commented] (AIRFLOW-7026) Improve SparkSqlHook's error message

    [ https://issues.apache.org/jira/browse/AIRFLOW-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060951#comment-17060951 ] 

ASF GitHub Bot commented on AIRFLOW-7026:
-----------------------------------------

sekikn commented on pull request #7749: [AIRFLOW-7026] Improve SparkSqlHook's error message
URL: https://github.com/apache/airflow/pull/7749
 
 
   * Replace self._conn.host in the error message with
     self._master, because the former is unused in
     SparkSqlHook actually.
   
   * Add self._sql into the error message, because it's
     the executed query or a file that contains it.
   
   ---
   Issue link: WILL BE INSERTED BY [boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-NNNN]`. AIRFLOW-NNNN = JIRA ID<sup>*</sup>
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   <sup>*</sup> For document-only changes commit message can start with `[AIRFLOW-XXXX]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Improve SparkSqlHook's error message
> ------------------------------------
>
>                 Key: AIRFLOW-7026
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-7026
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hooks
>    Affects Versions: 1.10.9
>            Reporter: Kengo Seki
>            Assignee: Kengo Seki
>            Priority: Major
>
> If {{SparkSqlHook.run_query()}} fails, it raises the following exception.
> {code}
>         if returncode:
>             raise AirflowException(
>                 "Cannot execute {} on {}. Process exit code: {}.".format(
>                     cmd, self._conn.host, returncode
>                 )
>             )
> {code}
> But this message is not so useful actually. For example:
> {code}
> In [1]: from airflow.providers.apache.spark.operators.spark_sql import SparkSqlOperator                                                                                      
> In [2]: SparkSqlOperator(sql="SELECT * FROM NON_EXISTENT_TABLE", master="local[*]", conn_id="spark_default", task_id="_").execute(None)                                      
> (snip)
> ---------------------------------------------------------------------------
> AirflowException                          Traceback (most recent call last)
> <ipython-input-2-d69c4454e999> in <module>
> ----> 1 SparkSqlOperator(sql="SELECT * FROM NON_EXISTENT_TABLE", master="local[*]", conn_id="spark_default", task_id="_").execute(None)
> ~/repos/incubator-airflow/airflow/providers/apache/spark/operators/spark_sql.py in execute(self, context)
>     105                                   yarn_queue=self._yarn_queue
>     106                                   )
> --> 107         self._hook.run_query()
>     108 
>     109     def on_kill(self):
> ~/repos/incubator-airflow/airflow/providers/apache/spark/hooks/spark_sql.py in run_query(self, cmd, **kwargs)
>     154             raise AirflowException(
>     155                 "Cannot execute {} on {}. Process exit code: {}.".format(
> --> 156                     cmd, self._conn.host, returncode
>     157                 )
>     158             )
> AirflowException: Cannot execute  on yarn. Process exit code: 1.
> {code}
> Most users will expect the executed query is shown as the first argument for the exception and the "master" value (i.e., "local[*]" here) as the second, but meaningless information (an empty string and "yarn") is shown instead.
> The reason are as follows:
> * The executed query is specified by the "sql" parameter for the {{SparkSqlHook.\_\_init__}} method, not by {{cmd}}. 
> * The "master" value is specified by the "master" parameter for the {{SparkSqlHook.\_\_init__}} method, not by {{self._conn.host}}. Actually, {{self._conn}} is not used at all in SparkSqlHook.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)