You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "andrew-parsons-janus (via GitHub)" <gi...@apache.org> on 2023/11/14 21:22:27 UTC

[I] "AirflowException: Invalid arguments were passed to GlueJobOperator" when setting update_config=True [airflow]

andrew-parsons-janus opened a new issue, #35637:
URL: https://github.com/apache/airflow/issues/35637

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   # Versions
   
   Airflow version: 2.4.3
   apache-airflow-providers-amazon==8.9.0
   
   # Airflow error
   
   ```
   Broken DAG: [/opt/airflow/dags/<project_directory>/<dag>.py] Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py", line 408, in apply_defaults
       result = func(self, **kwargs, default_args=default_args)
     File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py", line 756, in __init__
       raise AirflowException(
   airflow.exceptions.AirflowException: Invalid arguments were passed to GlueJobOperator (task_id: submit_glue_job). Invalid arguments were:
   **kwargs: {'update_config': True}
   ```
   
   
   ### What you think should happen instead
   
   I don't believe passing an argument to `GlueJobOperator`'s `update_config` init parameter should throw an exception. I also don't believe that I should have to set `ALLOW_ILLEGAL_ARGUMENTS`. Why would a documented parameter be illegal?
   
   If this is expected behavior, then it should be better documented.
   
   ### How to reproduce
   
   ## Airflow task
   
   ```python3
   @task_group(group_id="...", default_args=None)
   def run_glue_job(job_name: str, script_args: dict):
       submit_glue_job = GlueJobOperator(
   
           # BaseOperator kwargs
           task_id="submit_glue_job",
           retries=0,
           wait_for_completion=False,
           on_success_callback=foo,
           on_failure_callback=[foo, bar],
   
           # GlueJobOperator kwargs
           job_name=job_name,
           script_location=f"{S3_PATH/glue_script.py",
           script_args=script_args,
   
           update_config=True,  # <-- this causes an exception!
   
           **EXTRA_KWARGS_GLUE,
       )
   
       wait_on_glue_job = GlueJobSensor(
           task_id="wait_on_glue_job",
           job_name=job_name,
           run_id=submit_glue_job.output,  # type: ignore
           on_failure_callback=bar,
       )
   
       submit_glue_job >> wait_on_glue_job
   ```
   
   ### Operating System
   
   macOS 14.1.1
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==8.9.0
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   Docker version 20.10.24, build 297e128
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] "AirflowException: Invalid arguments were passed to GlueJobOperator" when setting update_config=True [airflow]

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #35637:
URL: https://github.com/apache/airflow/issues/35637#issuecomment-1811340097

   Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] "AirflowException: Invalid arguments were passed to GlueJobOperator" when setting update_config=True [airflow]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #35637:
URL: https://github.com/apache/airflow/issues/35637#issuecomment-1847993629

   This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] "AirflowException: Invalid arguments were passed to GlueJobOperator" when setting update_config=True [airflow]

Posted by "andrew-parsons-janus (via GitHub)" <gi...@apache.org>.
andrew-parsons-janus closed issue #35637: "AirflowException: Invalid arguments were passed to GlueJobOperator" when setting update_config=True
URL: https://github.com/apache/airflow/issues/35637


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] "AirflowException: Invalid arguments were passed to GlueJobOperator" when setting update_config=True [airflow]

Posted by "andrew-parsons-janus (via GitHub)" <gi...@apache.org>.
andrew-parsons-janus commented on issue #35637:
URL: https://github.com/apache/airflow/issues/35637#issuecomment-1850309032

   This was embarrassing. Sorry for creating noise! I was indeed running an old version of `apache-airflow-providers-amazon` without realizing it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] "AirflowException: Invalid arguments were passed to GlueJobOperator" when setting update_config=True [airflow]

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #35637:
URL: https://github.com/apache/airflow/issues/35637#issuecomment-1811983282

   Unable to reproduce on latest main.
   
   This DAG created fine without any issues
   
   ```python
   import pendulum
   
   from airflow.decorators import task_group
   from airflow.models.dag import DAG
   from airflow.providers.amazon.aws.sensors.glue import GlueJobSensor
   from airflow.providers.amazon.aws.operators.glue import GlueJobOperator
   
   
   S3_PATH = "s3://foo/bar"
   EXTRA_KWARGS_GLUE = {}
   
   
   with DAG(
       "issue_35637",
       start_date=pendulum.datetime(2023, 6, 1, tz="UTC"),
       schedule=None,
       catchup=False,
       tags=["issue", "35637"]
   ):
   
       @task_group(group_id="foo-bar", default_args=None)
       def run_glue_job(job_name: str, script_args: dict):
           submit_glue_job = GlueJobOperator(
               # BaseOperator kwargs
               task_id="submit_glue_job",
               retries=0,
               wait_for_completion=False,
               # GlueJobOperator kwargs
               job_name=job_name,
               script_location=f"{S3_PATH}/glue_script.py",
               script_args=script_args,
               update_config=True,
               **EXTRA_KWARGS_GLUE,
           )
   
           wait_on_glue_job = GlueJobSensor(
               task_id="wait_on_glue_job",
               job_name=job_name,
               run_id=submit_glue_job.output,
           )
   
           submit_glue_job >> wait_on_glue_job
   
   
       run_glue_job(job_name="foo", script_args={})
   ```
   
   You need to check on [more recent version of Airflow](https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html#release-notes), there is such of changes and fixes in Airflow Core since then, include `task_group` decorator.
   
   In additional you need to check that all of your Airflow components use same amazon-provider. The changes which add this field is 7.4.0 (https://github.com/apache/airflow/pull/30162) and maybe your scheduler/dag_processor/worker use different version (6.0.0 ???) which are below of this version


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org