You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/03/31 21:31:23 UTC
[GitHub] [airflow] Gollum999 opened a new issue #22670: Mixins in user-defined Operators do not get initialized
Gollum999 opened a new issue #22670:
URL: https://github.com/apache/airflow/issues/22670
### Apache Airflow version
2.2.4 (latest released)
### What happened
If a user defines a custom Operator that also uses a mixin class, there are some cases where that mixin will not be properly initialized. See example below.
I believe I've traced this down to the fact that `airflow.utils.log.logging_mixin.LoggingMixin.__init__` does not call `super().__init__`.
It is a common misconception that `super()` delegates to the base class of the type that is calling `super()`. It actually delegates to the next viable type from the Method Resolution Order (MRO) of the *entire inheritance hierarchy of the calling type*.
Importantly, this hierarchy may include user-defined types that are *completely unrelated to the type that is calling `super()`*. See [here](https://fuhm.net/super-harmful/) for a more in-depth description of the types of problems this can cause, and [here](https://rhettinger.wordpress.com/2011/05/26/super-considered-super/) for some tips for handling this in practice.
Unfortunately I think a proper fix will be much more involved than simply updating `LoggingMixin.__init__` to call `super().__init__(*args, **kwargs)`. I tried making this change locally, and I started getting errors like `TypeError: object.__init__() takes exactly one argument (the instance to initialize)`. Since these `__init__` functions are (almost) all parameterized, ultimately all types in the hierarchy need to coordinate in their usage of `super()` to consume all of the arguments before they reach the "root" call to `object.__init__`. The second article I linked shows some examples of this, but in practice I think it would essentially require global knowledge of every class in Airflow to confidently make this sort of change.
### What you think should happen instead
Users should be able to use mixins in custom operators.
### How to reproduce
Here's an example DAG that breaks:
```
#!/usr/bin/env python3
from datetime import datetime
from airflow import DAG
from airflow.models.baseoperator import BaseOperator
class MyMixin:
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.extra_message = ':)'
class HelloOperator(BaseOperator, MyMixin):
def __init__(self, *, name: str, **kwargs) -> None:
super().__init__(**kwargs)
self.name = name
def execute(self, context):
print(f'Hello {self.name}! {self.extra_message}')
with DAG(
'test_dag',
default_args={'retries': 0},
start_date=datetime(2022, 3, 30),
) as dag:
HelloOperator(task_id='task', name='Bob')
```
This fails with the following error:
```
[2022-03-31, 07:35:57 CDT] {taskinstance.py:1718} ERROR - Task failed with exception
Traceback (most recent call last):
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1334, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1460, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1516, in _execute_task
result = execute_callable(context=context)
File "/home/tsanders/airflow_test/dags/test_dag.py", line 20, in execute
print(f'Hello {self.name}! {self.extra_message}')
AttributeError: 'HelloOperator' object has no attribute 'extra_message'
[2022-03-31, 07:35:57 CDT] {taskinstance.py:1272} INFO - Marking task as FAILED. dag_id=test_dag, task_id=task, execution_date=20220329T000000, start_date=20220331T173557, end_date=20220331T173557
[2022-03-31, 07:35:57 CDT] {standard_task_runner.py:89} ERROR - Failed to execute job 25 for task task
Traceback (most recent call last):
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
args.func(args, dag=self.dag)
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/cli/commands/task_command.py", line 298, in task_run
_run_task_by_selected_method(args, dag, ti)
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method
_run_raw_task(args, ti)
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/cli/commands/task_command.py", line 180, in _run_raw_task
ti._run_raw_task(
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1334, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1460, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1516, in _execute_task
result = execute_callable(context=context)
File "/home/tsanders/airflow_test/dags/test_dag.py", line 20, in execute
print(f'Hello {self.name}! {self.extra_message}')
AttributeError: 'HelloOperator' object has no attribute 'extra_message'
```
### Operating System
CentOS 7.4
### Versions of Apache Airflow Providers
N/A
### Deployment
Other
### Deployment details
Standalone
### Anything else
As a note, it is possible to work around this in user code by forcing a different MRO. For example, we can use `class HelloOperator(MyMixin, BaseOperator)` or `class MyMixin(LoggingMixin)` to ensure that `LoggingMixin` comes after `MyMixin` in the MRO. However, these workarounds also require that `HelloOperator` and `MyMixin` handle `super().__init__` correctly - otherwise you'll see this same type of issue where internal Airflow classes are not properly initialized.
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #22670: Mixins in user-defined Operators do not get initialized
Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #22670:
URL: https://github.com/apache/airflow/issues/22670#issuecomment-1085162099
Yeah. We know about MRO. Simply don't use Mixins. The solution is simply to not use them. The best soluiton is to add docstring in the BaseOperator to not use Mixins in operators and warn the users. Mixins and mutliple inheritance are hard and it's completely counter-productive to try to support it. Would you like to make a PR for that?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #22670: Mixins in user-defined Operators do not get initialized
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #22670:
URL: https://github.com/apache/airflow/issues/22670#issuecomment-1085162099
Yeah. We know about MRO. Simply don't use Mixins. The solution is simply to not use them. The best soluiton is to add docstring in the operator to not use Mixins in operators and warn the users. Mixins and mutliple inheritance are hard and it's completely counter-productive to try to support it. Would you like to make a PR for that?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org