You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "m1racoli (via GitHub)" <gi...@apache.org> on 2023/02/03 20:33:32 UTC
[GitHub] [airflow] m1racoli opened a new issue, #29366: default_args feature incompatible with Dynamic Task Mapping
m1racoli opened a new issue, #29366:
URL: https://github.com/apache/airflow/issues/29366
### Apache Airflow version
2.5.1
### What happened
We use `default_args` to configure common parameters across our Airflow deployments. Some of those parameters are specific to certain operators or a group of operators.
When we use dynamic task mapping in addition to our `default_args` the DAG fails to parse.
### What you think should happen instead
The DAG should success to parse.
### How to reproduce
The DAG
```python
from typing import List
from airflow import DAG
from airflow.decorators import task
from airflow.operators.bash import BashOperator
from airflow.utils.dates import days_ago
default_args = dict(
start_date=days_ago(1),
use_legacy_sql=False,
something="else",
)
@task
def get_commands() -> List[str]:
return [
"echo hello",
"echo world",
]
with DAG(
"test_default_args",
default_args=default_args,
schedule=None,
) as dag:
commands = get_commands()
BashOperator.partial(
task_id="run_bash",
).expand(
bash_command=commands,
)
```
will result in the following error
```log
Broken DAG: [/usr/local/airflow/dags/test_dynamic_args.py] Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/airflow/models/mappedoperator.py", line 149, in __attrs_post_init__
validate_mapping_kwargs(self.operator_class, "partial", self.kwargs)
File "/usr/local/lib/python3.9/site-packages/airflow/models/mappedoperator.py", line 109, in validate_mapping_kwargs
raise TypeError(f"{op.__name__}.{func}() got {error}")
TypeError: BashOperator.partial() got unexpected keyword arguments 'use_legacy_sql', 'something'
```
### Operating System
Debian GNU/Linux 11 (bullseye)
### Versions of Apache Airflow Providers
```txt
apache-airflow-providers-amazon==6.2.0
apache-airflow-providers-apache-hive==5.1.1
apache-airflow-providers-apache-livy==3.2.0
apache-airflow-providers-celery==3.1.0
apache-airflow-providers-cncf-kubernetes==5.1.1
apache-airflow-providers-common-sql==1.3.3
apache-airflow-providers-databricks==4.0.0
apache-airflow-providers-dbt-cloud==2.3.1
apache-airflow-providers-elasticsearch==4.3.3
apache-airflow-providers-ftp==3.3.0
apache-airflow-providers-google==8.8.0
apache-airflow-providers-http==4.1.1
apache-airflow-providers-imap==3.1.1
apache-airflow-providers-microsoft-azure==5.1.0
apache-airflow-providers-postgres==5.4.0
apache-airflow-providers-redis==3.1.0
apache-airflow-providers-sftp==4.2.1
apache-airflow-providers-snowflake==4.0.2
apache-airflow-providers-sqlite==3.3.1
apache-airflow-providers-ssh==3.4.0
```
### Deployment
Astronomer
### Deployment details
We use Astronomer Runtime 7.2.0.
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] vchiapaikeo commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "vchiapaikeo (via GitHub)" <gi...@apache.org>.
vchiapaikeo commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1464041406
I also recently ran into this. Just wanted to share that the issue ends up propping up as a dag parsing error on the UI:
```
Broken DAG: [/opt/airflow/dags/dataeng/batch/test_dynamic_spark.py] Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/mappedoperator.py", line 149, in __attrs_post_init__
validate_mapping_kwargs(self.operator_class, "partial", self.kwargs)
File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/mappedoperator.py", line 109, in validate_mapping_kwargs
raise TypeError(f"{op.__name__}.{func}() got {error}")
TypeError: BashOperator.partial() got unexpected keyword arguments 'project_id', 'gcp_conn_id', 'google_cloud_conn_id', 'impersonation_chain', 'auto_page_management', 'alert_in_test', 'abcdef'
```
<img width="1202" alt="image" src="https://user-images.githubusercontent.com/9200263/224366996-bd46daef-62f0-48d4-8caa-ec9c7c586f2c.png">
I'm actively looking for workarounds 👀
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1437495983
@uranusjr - do you think it would be possible/easy to add default_args to partial invocation automatically, sounds like a possibility, my only worry is whether expand would override them if they are also specified there ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1438031619
> Overriding params in expand which are defined via default_args would be intended behaviour and in line with the behaviour of normal operators, wouldn't be?
Yes. By "worry" I mean, whether there are no problems somewhere deep the stack that would make it difficult or have side efffects that we do not see without deep knowledge of how dynamic task mapping is done. It's easy to say "things should be done this way" and much more difficult to do it.
BTW. @m1racoli -> maybe you would like to implement a POC for that one to dispell the worries? That would be good opportunity to learn the deep internals and contribute back to the project.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] vchiapaikeo commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "vchiapaikeo (via GitHub)" <gi...@apache.org>.
vchiapaikeo commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1465278077
@uranusjr , what do you think about something like this to remove invalid keys from OperatorPartial's self.kwargs?
https://github.com/apache/airflow/pull/30056
I'm using the below DAG to test and it's working as expected - valid args like `env` are applied to BashOperator and invalid ones like `abcdef` are deleted from self.kwargs.
```py
from airflow import DAG
from airflow.decorators import task
from airflow.operators.bash import BashOperator
DEFAULT_TASK_ARGS = {
"owner": "gcp-data-platform",
"start_date": "2023-03-11",
"retries": 0,
"abcdef": 12312,
"env": {
"HELLO": "WORLD",
},
}
@task
def make_list():
# This can also be from an API call, checking a database, -- almost anything you like, as long as the
# resulting list/dictionary can be stored in the current XCom backend.
return [
"echo 1 ${HELLO}",
"echo 2 ${HELLO}",
"echo 3 ${HELLO}",
]
with DAG(
schedule_interval="@daily",
max_active_runs=1,
max_active_tasks=10,
catchup=False,
dag_id="test_expand_with_default_args",
default_args=DEFAULT_TASK_ARGS,
) as dag:
xcom_output = make_list.override(do_xcom_push=True)()
t = BashOperator.partial(task_id="bash_op").expand(bash_command=xcom_output)
```
Granted, this does make the call to `validate_mapping_kwargs` in `__attrs_post_init__` obsolete. Do we really need this validation step if we are guaranteed to only be passing valid kwargs through now?
Successful DAG:
<img width="1440" alt="image" src="https://user-images.githubusercontent.com/9200263/224566371-2452ce6c-42ed-4271-9b10-50c03ef47aaa.png">
Mapped Task (1):
<img width="1440" alt="image" src="https://user-images.githubusercontent.com/9200263/224566386-f067f09e-d98d-4054-83dd-ed3b7bc37058.png">
If this looks good to you, I can work on unit tests here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] default_args feature incompatible with Dynamic Task Mapping [airflow]
Posted by "harshavardhan (via GitHub)" <gi...@apache.org>.
harshavardhan commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1954145564
I am encountering the same issue. Wondering if there are any known workarounds.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] m1racoli commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "m1racoli (via GitHub)" <gi...@apache.org>.
m1racoli commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1419524605
> IMO it should fail for the normal operators as well.
That would limit the use case for default_args dramatically. I kinda expected the current behaviour for normal operators to be part of the feature set of default_args.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] m1racoli commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "m1racoli (via GitHub)" <gi...@apache.org>.
m1racoli commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1416921522
The above parameters are just an example. The point is that centrally define certain parameters which only apply to particular operators and are ignored for other operators. This works with normal operators, but not with mapped operators.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] pankajastro commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "pankajastro (via GitHub)" <gi...@apache.org>.
pankajastro commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1418942076
hmm, I agree that it is not consistent but when you pass a kwargs using `default_args` then that kwargs is applied to all operators of a DAG off course you override the value of these params. what should be the right behaviour here, should it fail for normal operators as well or should work for mapped operators? IMO it should fail for the normal operators as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1437493283
I am for leaving it as is. Dynamic Task mapping is special case and specifically using part cc: @uranusjr
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "uranusjr (via GitHub)" <gi...@apache.org>.
uranusjr commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1439541245
Intuitively it feels possible, except `default_args` is much more volatile to get right since it deals with unspecified arguments from indefinite operator classes. A poc would make discussion much easier.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] bryzgaloff commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "bryzgaloff (via GitHub)" <gi...@apache.org>.
bryzgaloff commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1432666361
Hi @m1racoli thank you for reporting an issue :+1: I have encountered the same limitation which was surprising to me. I agree that `default_args` should keep working for dynamic tasks mapping and do it the same way as for regular operators: applied only when needed for a particular operator based on its signature. Probably, `default_args` are simply copied into `expand`'s too broad `**mapped_kwargs` signature which causes an issue.
**As a quick potential workaround**, you may wrap a mapped operator into a `TaskGroup` with `default_args={}`, because here is what [docs say](https://airflow.apache.org/docs/apache-airflow/2.5.1/core-concepts/dags.html#taskgroups):
> `TaskGroup` also supports `default_args` like DAG, it will **overwrite the `default_args` in DAG level**
I have _not_ tested myself but accidentally noticed this in the documentation, so if you have a chance to try it out, please let me know if it works.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] pankajastro commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "pankajastro (via GitHub)" <gi...@apache.org>.
pankajastro commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1416820439
but both params `use_legacy_sql` and `something` does look correct to me. How you are using these parameters?
I believe it does not throw errors in case of non-dynamic tasks because https://github.com/apache/airflow/blob/main/airflow/models/baseoperator.py#L374 block i.e we copy only valid param from default to kwargs so invalid param get ignored.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] subuserzero commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "subuserzero (via GitHub)" <gi...@apache.org>.
subuserzero commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1442444585
> **As a quick potential workaround**, you may wrap a mapped operator into a `TaskGroup` with `default_args={}`, because here is what [docs say](https://airflow.apache.org/docs/apache-airflow/2.5.1/core-concepts/dags.html#taskgroups):
>
> > `TaskGroup` also supports `default_args` like DAG, it will **overwrite the `default_args` in DAG level**
>
> I have _not_ tested myself but accidentally noticed this in the documentation, so if you have a chance to try it out, please let me know if it works.
That did not seem to work for me, but replacing the `default_args` of the DAG inside the task-group definition with an empty dict seems to get past the TypeError:
@task_group(group_id='foo_group', dag=<dag-ref>)
def foo_group():
dag.default_args = {}
....
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #29366: default_args feature incompatible with Dynamic Task Mapping
URL: https://github.com/apache/airflow/issues/29366
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "uranusjr (via GitHub)" <gi...@apache.org>.
uranusjr commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1465530485
There is already #29913 so you’ll need to take it into account.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] m1racoli commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping
Posted by "m1racoli (via GitHub)" <gi...@apache.org>.
m1racoli commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1438025543
> my only worry is whether expand would override them if they are also specified there
Overriding params in expand which are defined via default_args would be intended behaviour and in line with the behaviour of normal operators, wouldn't be?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] default_args feature incompatible with Dynamic Task Mapping [airflow]
Posted by "asafsneh (via GitHub)" <gi...@apache.org>.
asafsneh commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1868951831
Hi @potiuk , thank you for the helpful recommendation and explanation. I apologize if my previous question may have come across as rude or entitled. It clearly slipped my mind that the people behind this open-source project aren't working specifically for me or others and indeed, expecting an ETA isn't quite fair or respectful.
I appreciate your understanding and the community's effort in maintaining the project. To express my gratitude and commitment, I'll certainly try my best to contribute and solve the problem by working on the code myself.
I hope my skills are strong enough.
Thanks again for your time and continuous efforts!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] default_args feature incompatible with Dynamic Task Mapping [airflow]
Posted by "asafsneh (via GitHub)" <gi...@apache.org>.
asafsneh commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1868885480
Hi
Is there any ETA to fix this issue?
It also happens to us
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] default_args feature incompatible with Dynamic Task Mapping [airflow]
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1868941017
As everything in Open Source - if you care about a problem the best way to speed up a solution is to create and lead to completion PR fixing it. This is what I heartily recommend. The second best is to encourage people who create such PR and volunteer to help to test in-progress solutions.- apply patches, review the code and generally thank people who do the job and to help them to complete it. The third best is to pay someone to do the fix - if you have no time or skills to do either of the two.
This is generally who things work with open source where you get software for free and without any guarantees, having an ETA expectation when you do not contribute is kind of strange if you ask me
But you have all the ways above to contribute I described above @asafsneh . Which one will you choose to speed up resolution of the problem I wonder?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org