You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/02 16:32:54 UTC

[GitHub] [airflow] hterik opened a new issue #19983: Exclude airflow runner internals from Operator failure tracebacks

hterik opened a new issue #19983:
URL: https://github.com/apache/airflow/issues/19983


   ### Description
   
   Given a simple dag like this:
   ```py
   from airflow import DAG
   from airflow.utils.dates import days_ago
   from airflow.operators.bash import BashOperator
   
   with DAG("instantfail",
            description="instantfail",
            tags=["example"],
            catchup=False,
            schedule_interval=None,
            start_date=days_ago(2),
           ) as dag:
   
       BashOperator(bash_command="echo hello && false", task_id="b")
   ```
   
   The error log from the failing bash command, `false`, becomes incredibly big with lots of irrelevant information for the writer of the DAG, if that's not enough, the error is printed twice:
   ```
   [2021-12-02, 17:07:01 UTC] {subprocess.py:85} INFO - Output:
   [2021-12-02, 17:07:01 UTC] {subprocess.py:89} INFO - hello
   [2021-12-02, 17:07:01 UTC] {subprocess.py:93} INFO - Command exited with return code 1
   [2021-12-02, 17:07:01 UTC] {taskinstance.py:1703} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task
       self._execute_task_with_callbacks(context)
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks
       result = self._execute_task(context, self.task)
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
       result = execute_callable(context=context)
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/operators/bash.py", line 187, in execute
       raise AirflowException(
   airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 1.
   [2021-12-02, 17:07:01 UTC] {taskinstance.py:1270} INFO - Marking task as FAILED. dag_id=instantfail, task_id=b, execution_date=20211202T160659, start_date=20211202T160701, end_date=20211202T160701
   [2021-12-02, 17:07:01 UTC] {standard_task_runner.py:88} ERROR - Failed to execute job 292 for task b
   Traceback (most recent call last):
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
       args.func(args, dag=self.dag)
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/cli/commands/task_command.py", line 292, in task_run
       _run_task_by_selected_method(args, dag, ti)
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method
       _run_raw_task(args, ti)
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/cli/commands/task_command.py", line 180, in _run_raw_task
       ti._run_raw_task(
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/utils/session.py", line 70, in wrapper
       return func(*args, session=session, **kwargs)
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task
       self._execute_task_with_callbacks(context)
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks
       result = self._execute_task(context, self.task)
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
       result = execute_callable(context=context)
     File "/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/operators/bash.py", line 187, in execute
       raise AirflowException(
   airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 1.
   [2021-12-02, 17:07:01 UTC] {local_task_job.py:154} INFO - Task exited with return code 1
   [2021-12-02, 17:07:01 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
   ```
   Often operators can be be more advanced, especially PythonOperator and taskflow which frequently creates equally deep stack traces themselves, which is usually what you are interested in, but together it accumulates into several pages of error.
   
   I'm usually the guy who thinks a stack trace is the best error message and like the context it provides, but here i think it goes a bit too far.
   
   ------------------------------------------------------------
   
   1. Why is the error printed twice?
   2. The writer of a dag should be able to assume the execution path up to execute_callable() was sane. If i'm debugging a DAG or an Operator i don't want to debug Airflow task runner internals. I don't know how true this is but this should be the goal at least.
   
   Expected log output should only contain 
   ```
     File "airflow/operators/bash.py", line 187, in execute
       raise AirflowException(
   airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 1.
   ```
   
   Can `TaskInstance._execute_task` be made to catch the operators exception and rethrow as something else, that can be ignored at the end by `standard_task_runner`?
   
   Or is this something that simply should be solved by logging configuration?
   
   
   
   ### Use case/motivation
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] potiuk commented on issue #19983: Exclude airflow runner internals from Operator failure tracebacks

Posted by GitBox <gi...@apache.org>.

potiuk commented on issue #19983:
URL: https://github.com/apache/airflow/issues/19983#issuecomment-986326284

I think it depends on whose view you take into account, I am for adding as much information as possible in the stack trace, because you never know when more stack trace is useful. If a user reports a "half hearted" stack trace I am usually not able to help as a maintainer.

Surely, If the information is duplicated - and you know how to de-duplicate it reliably - feel free to open PR for that. It does look like in this case it should be possible. But I am personally not loosing my sleep over it. I personally usually err on the side of having more information of errors than needed rather than risking I miss some information - especially in cases like that when you might expect "ANY" error and ANY environment the error might come from.

Hiding anything in this case is IMHO far too easy to loose crucial information that might help maintainers to debug problems. Surely in this case proably most of the stack trace is repetitive most of the time. But also those errors are serving a different purpose - expect unexpected. And if thanks to the stack traces you wil sometimes be able to diagnose much more than you think. I cannot count how many times I discovered that someone's process actuallly use a different version of the software than they thought - simply because I noticed that it's impossible to get the stacktrace with this particular line number in platform's stack-trace.

I'd rather personally focus (and I do for quite some time) on making the "known" error messages more actionable and explanatory to the users when we "know" what the issue is.

But I think getting rid of the full stack trace from logs (other than duplication) will make it far harder to get help by our users, which for me trumps the conciseness of the logs.

I am not sure if this is an "iissue"/feature. Maybe others woudl be interested in chiming in, so I am turning it into a discussion, but if you have any concrete PR proposals to improve the logging of particular cases (without loosing the "observability" of the issues when they appear - you are most welcome @hterik) .

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org