You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "m1racoli (via GitHub)" <gi...@apache.org> on 2023/02/03 20:33:32 UTC

[GitHub] [airflow] m1racoli opened a new issue, #29366: default_args feature incompatible with Dynamic Task Mapping

m1racoli opened a new issue, #29366:
URL: https://github.com/apache/airflow/issues/29366

   ### Apache Airflow version
   
   2.5.1
   
   ### What happened
   
   We use `default_args` to configure common parameters across our Airflow deployments. Some of those parameters are specific to certain operators or a group of operators.
   
   When we use dynamic task mapping in addition to our `default_args` the DAG fails to parse.
   
   
   ### What you think should happen instead
   
   The DAG should success to parse.
   
   ### How to reproduce
   
   The DAG
   
   ```python
   from typing import List
   
   from airflow import DAG
   from airflow.decorators import task
   from airflow.operators.bash import BashOperator
   from airflow.utils.dates import days_ago
   
   default_args = dict(
       start_date=days_ago(1),
       use_legacy_sql=False,
       something="else",
   )
   
   
   @task
   def get_commands() -> List[str]:
       return [
           "echo hello",
           "echo world",
       ]
   
   
   with DAG(
       "test_default_args",
       default_args=default_args,
       schedule=None,
   ) as dag:
       commands = get_commands()
   
       BashOperator.partial(
           task_id="run_bash",
       ).expand(
           bash_command=commands,
       )
   ```
   
   will result in the following error
   
   ```log
   Broken DAG: [/usr/local/airflow/dags/test_dynamic_args.py] Traceback (most recent call last):
     File "/usr/local/lib/python3.9/site-packages/airflow/models/mappedoperator.py", line 149, in __attrs_post_init__
       validate_mapping_kwargs(self.operator_class, "partial", self.kwargs)
     File "/usr/local/lib/python3.9/site-packages/airflow/models/mappedoperator.py", line 109, in validate_mapping_kwargs
       raise TypeError(f"{op.__name__}.{func}() got {error}")
   TypeError: BashOperator.partial() got unexpected keyword arguments 'use_legacy_sql', 'something'
   ```
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Versions of Apache Airflow Providers
   
   ```txt
   apache-airflow-providers-amazon==6.2.0
   apache-airflow-providers-apache-hive==5.1.1
   apache-airflow-providers-apache-livy==3.2.0
   apache-airflow-providers-celery==3.1.0
   apache-airflow-providers-cncf-kubernetes==5.1.1
   apache-airflow-providers-common-sql==1.3.3
   apache-airflow-providers-databricks==4.0.0
   apache-airflow-providers-dbt-cloud==2.3.1
   apache-airflow-providers-elasticsearch==4.3.3
   apache-airflow-providers-ftp==3.3.0
   apache-airflow-providers-google==8.8.0
   apache-airflow-providers-http==4.1.1
   apache-airflow-providers-imap==3.1.1
   apache-airflow-providers-microsoft-azure==5.1.0
   apache-airflow-providers-postgres==5.4.0
   apache-airflow-providers-redis==3.1.0
   apache-airflow-providers-sftp==4.2.1
   apache-airflow-providers-snowflake==4.0.2
   apache-airflow-providers-sqlite==3.3.1
   apache-airflow-providers-ssh==3.4.0
   ```
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   We use Astronomer Runtime 7.2.0.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vchiapaikeo commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "vchiapaikeo (via GitHub)" <gi...@apache.org>.
vchiapaikeo commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1464041406

   I also recently ran into this. Just wanted to share that the issue ends up propping up as a dag parsing error on the UI:
   
   ```
   Broken DAG: [/opt/airflow/dags/dataeng/batch/test_dynamic_spark.py] Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/mappedoperator.py", line 149, in __attrs_post_init__
       validate_mapping_kwargs(self.operator_class, "partial", self.kwargs)
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/mappedoperator.py", line 109, in validate_mapping_kwargs
       raise TypeError(f"{op.__name__}.{func}() got {error}")
   TypeError: BashOperator.partial() got unexpected keyword arguments 'project_id', 'gcp_conn_id', 'google_cloud_conn_id', 'impersonation_chain', 'auto_page_management', 'alert_in_test', 'abcdef'
   ```
   
   <img width="1202" alt="image" src="https://user-images.githubusercontent.com/9200263/224366996-bd46daef-62f0-48d4-8caa-ec9c7c586f2c.png">
   
   I'm actively looking for workarounds 👀 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1437495983

   @uranusjr - do you think it would be possible/easy to add default_args to partial invocation automatically, sounds like a possibility, my only worry is whether expand would override them if they are also specified there ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1438031619

   > Overriding params in expand which are defined via default_args would be intended behaviour and in line with the behaviour of normal operators, wouldn't be?
   
   Yes. By "worry" I mean, whether there are no problems somewhere deep the stack that would make it difficult or have side efffects that we do not see without deep knowledge of how dynamic task mapping is done. It's easy to say "things should be done this way" and much more difficult to do it.
   
   BTW. @m1racoli -> maybe you would like to implement a POC for that one to dispell the worries? That would be good opportunity to learn the deep internals and contribute back to the project. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vchiapaikeo commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "vchiapaikeo (via GitHub)" <gi...@apache.org>.
vchiapaikeo commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1465278077

   @uranusjr , what do you think about something like this to remove invalid keys from OperatorPartial's self.kwargs?
   
   https://github.com/apache/airflow/pull/30056
   
   I'm using the below DAG to test and it's working as expected - valid args like `env` are applied to BashOperator and invalid ones like `abcdef` are deleted from self.kwargs.
   
   ```py
   from airflow import DAG
   
   from airflow.decorators import task
   from airflow.operators.bash import BashOperator
   
   DEFAULT_TASK_ARGS = {
       "owner": "gcp-data-platform",
       "start_date": "2023-03-11",
       "retries": 0,
       "abcdef": 12312,
       "env": {
           "HELLO": "WORLD",
       },
   }
   
   
   @task
   def make_list():
       # This can also be from an API call, checking a database, -- almost anything you like, as long as the
       # resulting list/dictionary can be stored in the current XCom backend.
       return [
           "echo 1 ${HELLO}",
           "echo 2 ${HELLO}",
           "echo 3 ${HELLO}",
       ]
   
   with DAG(
       schedule_interval="@daily",
       max_active_runs=1,
       max_active_tasks=10,
       catchup=False,
       dag_id="test_expand_with_default_args",
       default_args=DEFAULT_TASK_ARGS,
   ) as dag:
       xcom_output = make_list.override(do_xcom_push=True)()
   
       t = BashOperator.partial(task_id="bash_op").expand(bash_command=xcom_output)
   ```
   
   Granted, this does make the call to `validate_mapping_kwargs` in `__attrs_post_init__` obsolete. Do we really need this validation step if we are guaranteed to only be passing valid kwargs through now?
   
   
   Successful DAG:
   
   <img width="1440" alt="image" src="https://user-images.githubusercontent.com/9200263/224566371-2452ce6c-42ed-4271-9b10-50c03ef47aaa.png">
   
   Mapped Task (1):
   
   <img width="1440" alt="image" src="https://user-images.githubusercontent.com/9200263/224566386-f067f09e-d98d-4054-83dd-ed3b7bc37058.png">
   
   
   If this looks good to you, I can work on unit tests here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] default_args feature incompatible with Dynamic Task Mapping [airflow]

Posted by "harshavardhan (via GitHub)" <gi...@apache.org>.
harshavardhan commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1954145564

   I am encountering the same issue. Wondering if there are any known workarounds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] m1racoli commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "m1racoli (via GitHub)" <gi...@apache.org>.
m1racoli commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1419524605

   > IMO it should fail for the normal operators as well.
   
   That would limit the use case for default_args dramatically. I kinda expected the current behaviour for normal operators to be part of the feature set of default_args.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] m1racoli commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "m1racoli (via GitHub)" <gi...@apache.org>.
m1racoli commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1416921522

   The above parameters are just an example. The point is that centrally define certain parameters which only apply to particular operators and are ignored for other operators. This works with normal operators, but not with mapped operators.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "pankajastro (via GitHub)" <gi...@apache.org>.
pankajastro commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1418942076

   hmm, I agree that it is not consistent but when you pass a kwargs using `default_args` then that kwargs is applied to all operators of a DAG off course you override the value of these params. what should be the right behaviour here, should it fail for normal operators as well or should work for mapped operators?  IMO it should fail for the normal operators as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1437493283

   I am for leaving it as is. Dynamic Task mapping is special case and specifically using part cc: @uranusjr 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "uranusjr (via GitHub)" <gi...@apache.org>.
uranusjr commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1439541245

   Intuitively it feels possible, except `default_args` is much more volatile to get right since it deals with unspecified arguments from indefinite operator classes. A poc would make discussion much easier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bryzgaloff commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "bryzgaloff (via GitHub)" <gi...@apache.org>.
bryzgaloff commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1432666361

   Hi @m1racoli thank you for reporting an issue :+1: I have encountered the same limitation which was surprising to me. I agree that `default_args` should keep working for dynamic tasks mapping and do it the same way as for regular operators: applied only when needed for a particular operator based on its signature. Probably, `default_args` are simply copied into `expand`'s too broad `**mapped_kwargs` signature which causes an issue.
   
   **As a quick potential workaround**, you may wrap a mapped operator into a `TaskGroup` with `default_args={}`, because here is what [docs say](https://airflow.apache.org/docs/apache-airflow/2.5.1/core-concepts/dags.html#taskgroups):
   
   > `TaskGroup` also supports `default_args` like DAG, it will **overwrite the `default_args` in DAG level**
   
   I have _not_ tested myself but accidentally noticed this in the documentation, so if you have a chance to try it out, please let me know if it works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "pankajastro (via GitHub)" <gi...@apache.org>.
pankajastro commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1416820439

   but both params `use_legacy_sql` and `something` does look correct to me. How you are using these parameters? 
   I believe it does not throw errors in case of non-dynamic tasks because https://github.com/apache/airflow/blob/main/airflow/models/baseoperator.py#L374 block i.e we copy only valid param from default to kwargs so invalid param get ignored.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] subuserzero commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "subuserzero (via GitHub)" <gi...@apache.org>.
subuserzero commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1442444585

   > **As a quick potential workaround**, you may wrap a mapped operator into a `TaskGroup` with `default_args={}`, because here is what [docs say](https://airflow.apache.org/docs/apache-airflow/2.5.1/core-concepts/dags.html#taskgroups):
   > 
   > > `TaskGroup` also supports `default_args` like DAG, it will **overwrite the `default_args` in DAG level**
   > 
   > I have _not_ tested myself but accidentally noticed this in the documentation, so if you have a chance to try it out, please let me know if it works.
   
   That did not seem to work for me, but replacing the `default_args` of the DAG inside the task-group definition with an empty dict seems to get past the TypeError:
   
       @task_group(group_id='foo_group', dag=<dag-ref>)
       def foo_group():
           dag.default_args = {}
           ....
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #29366: default_args feature incompatible with Dynamic Task Mapping
URL: https://github.com/apache/airflow/issues/29366


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "uranusjr (via GitHub)" <gi...@apache.org>.
uranusjr commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1465530485

   There is already #29913 so you’ll need to take it into account.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] m1racoli commented on issue #29366: default_args feature incompatible with Dynamic Task Mapping

Posted by "m1racoli (via GitHub)" <gi...@apache.org>.
m1racoli commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1438025543

   > my only worry is whether expand would override them if they are also specified there
   Overriding params in expand which are defined via default_args would be intended behaviour and in line with the behaviour of normal operators, wouldn't be?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] default_args feature incompatible with Dynamic Task Mapping [airflow]

Posted by "asafsneh (via GitHub)" <gi...@apache.org>.
asafsneh commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1868951831

   Hi @potiuk , thank you for the helpful recommendation and explanation. I apologize if my previous question may have come across as rude or entitled. It clearly slipped my mind that the people behind this open-source project aren't working specifically for me or others and indeed, expecting an ETA isn't quite fair or respectful.
   
   I appreciate your understanding and the community's effort in maintaining the project. To express my gratitude and commitment, I'll certainly try my best to contribute and solve the problem by working on the code myself. 
   I hope my skills are strong enough.
   Thanks again for your time and continuous efforts!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] default_args feature incompatible with Dynamic Task Mapping [airflow]

Posted by "asafsneh (via GitHub)" <gi...@apache.org>.
asafsneh commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1868885480

   Hi
   Is there any ETA to fix this issue?
   It also happens to us


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] default_args feature incompatible with Dynamic Task Mapping [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29366:
URL: https://github.com/apache/airflow/issues/29366#issuecomment-1868941017

   As everything in Open Source - if you care about a problem the best way to speed up a solution is to create and lead to completion PR fixing it. This is what I heartily recommend. The second best is to encourage people who create such PR and volunteer to help to test in-progress solutions.- apply patches, review the code and generally thank people who do the job and to help them to complete it. The third best is to pay someone to do the fix - if you have no time or skills to do either of the two.
   
   
   This is generally who things work with open source where you get software for free and without any guarantees, having an ETA expectation when you do not contribute is kind of strange if you ask me
   
   But you have all the ways above to contribute I described above @asafsneh . Which one will you choose to speed up resolution of the problem I wonder?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org