You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Dafyddjb (via GitHub)" <gi...@apache.org> on 2023/03/07 16:44:42 UTC

[GitHub] [airflow] Dafyddjb opened a new issue, #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Dafyddjb opened a new issue, #29960:
URL: https://github.com/apache/airflow/issues/29960

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon = "7.3.0"
   
   ### Apache Airflow version
   
   2.5.1
   
   ### Operating System
   
   Debian GNU/Linux
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   After updating the provider version to 7.3.0 from 6.0.0, our glue jobs started failing. We currently use the GlueJobOperator to run existing Glue jobs that we manage in Terraform.  The full traceback is below:
   ```
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 150, in execute
       glue_job_run = glue_job.initialize_job(self.script_args, self.run_job_kwargs)
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 165, in initialize_job
       job_name = self.create_or_update_glue_job()
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 325, in create_or_update_glue_job
       config = self.create_glue_job_config()
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 108, in create_glue_job_config
       execution_role = self.get_iam_execution_role()
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 143, in get_iam_execution_role
       glue_execution_role = iam_client.get_role(RoleName=self.role_name)
     File "/home/airflow/.local/lib/python3.9/site-packages/botocore/client.py", line 530, in _api_call
       return self._make_api_call(operation_name, kwargs)
     File "/home/airflow/.local/lib/python3.9/site-packages/botocore/client.py", line 919, in _make_api_call
       request_dict = self._convert_to_request_dict(
     File "/home/airflow/.local/lib/python3.9/site-packages/botocore/client.py", line 990, in _convert_to_request_dict
       request_dict = self._serializer.serialize_to_request(
     File "/home/airflow/.local/lib/python3.9/site-packages/botocore/validate.py", line 381, in serialize_to_request
       raise ParamValidationError(report=report.generate_report())
   botocore.exceptions.ParamValidationError: Parameter validation failed:
   Invalid type for parameter RoleName, value: None, type: <class 'NoneType'>, valid types: <class 'str'>
   ```
   
   
   ### What you think should happen instead
   
   The operator creates a new job run for a glue job without additional configuration. 
   
   ### How to reproduce
   
   Create a DAG with a GlueJobOperator without using `script_location`, `s3_bucket` and `iam_role_name`. Example:
   
   ```python
   task = GlueJobOperator(task_id="glue-task", job_name=<glue-job-name>)
   ```
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.
URL: https://github.com/apache/airflow/issues/29960


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Posted by "o-nikolas (via GitHub)" <gi...@apache.org>.
o-nikolas commented on issue #29960:
URL: https://github.com/apache/airflow/issues/29960#issuecomment-1459095862

   Looks like this is a bug introduced in #27893
   
   We now try generate the config for the Glue Job before we know whether we're creating a new one or updating an existing job. If the job already exists the user doesn't need to provide as much info but the new method for generating the config assumes all info for creating a new job is present (in this case the role). So it breaks since this user hasn't provided a role, because their job already exists.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] romibuzi commented on issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Posted by "romibuzi (via GitHub)" <gi...@apache.org>.
romibuzi commented on issue #29960:
URL: https://github.com/apache/airflow/issues/29960#issuecomment-1460767382

   Oh damn indeed the operator will now try to update configuration :( 
   Maybe the best way to handle both ways would be to add a parameter in the operator `update_config: bool = False`. And the update behavior would only happens in case this parameter is set to True?
   
   Or as @Taragolis advised we split this operator in 2 and create another one `GlueStartJobOperator` which will only be responsible to start an existing Glue job? Note that with this strategy the backword compatibility would be more problematic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29960:
URL: https://github.com/apache/airflow/issues/29960#issuecomment-1461640908

   > This would be a straightforward solution for sure! But it would be an API change, and we can't put the genie back in the bottle: your change is already out and folks could be already using it with the expectation that it updates the job without the need for a flag. However, I think that's a pretty small contingent of folks, and we could consider this behaviour a bug, honestly, so fixing it in this way should probably not trigger back compat. I'd be interested to hear what others think.
   
   Yes. API change is sometimes necessary to fix bugs. We are not (and should not be) extremely strict with "any api change is backwards incompatible" - if for example the interface is unusable or it behaves in  unpredictable or illogical ways, it's fine to introduce such changes as bugfixes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29960:
URL: https://github.com/apache/airflow/issues/29960#issuecomment-1459070017

   cc: @o-nikolas @vincbeck @ferruzzi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #29960:
URL: https://github.com/apache/airflow/issues/29960#issuecomment-1458492263

   Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Posted by "o-nikolas (via GitHub)" <gi...@apache.org>.
o-nikolas commented on issue #29960:
URL: https://github.com/apache/airflow/issues/29960#issuecomment-1459158642

   Worse than that, the whole updating job logic makes all sorts of assumptions that don't hold true if the job is created outside of the operator (like in this case where the user created the job with terraform). The new logic will detect that the defaults in the operator generated config are different from whatever was used outside of airflow and it will try to clobber the Job. I didn't get a chance to review #27893, but it has many back-compat issues...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Posted by "o-nikolas (via GitHub)" <gi...@apache.org>.
o-nikolas commented on issue #29960:
URL: https://github.com/apache/airflow/issues/29960#issuecomment-1461068942

   > Oh damn indeed the operator will now try to update configuration :( Maybe the best way to handle both ways would be to add a parameter in the operator `update_config: bool = False`. And the update behavior would only happens in case this parameter is set to True?
   
   This would be a straightforward solution for sure! But it would be an API change, and we can't put the genie back in the bottle: your change is already out and folks could be already using it with the expectation that it updates the job without the need for a flag. However, I think that's a pretty small contingent of folks, and we could consider this behaviour a bug, honestly, so fixing it in this way should probably not trigger back compat. I'd be interested to hear what others think. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Posted by "o-nikolas (via GitHub)" <gi...@apache.org>.
o-nikolas commented on issue #29960:
URL: https://github.com/apache/airflow/issues/29960#issuecomment-1466528195

   Thanks for the reply @potiuk, and I also see some :+1: responses, so I think that's consensus :smiley: 
   
   @romibuzi do you have the time and interest in creating a PR with the fix? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Posted by "vincbeck (via GitHub)" <gi...@apache.org>.
vincbeck commented on issue #29960:
URL: https://github.com/apache/airflow/issues/29960#issuecomment-1460332433

   Oops! I did review this PR and did not see this. @romibuzi Any chance you can take a look at this issue and come up with a fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] romibuzi commented on issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Posted by "romibuzi (via GitHub)" <gi...@apache.org>.
romibuzi commented on issue #29960:
URL: https://github.com/apache/airflow/issues/29960#issuecomment-1476231324

   I have proposed a PR adding this new parameter here: https://github.com/apache/airflow/pull/30162


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] romibuzi commented on issue #29960: GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version.

Posted by "romibuzi (via GitHub)" <gi...@apache.org>.
romibuzi commented on issue #29960:
URL: https://github.com/apache/airflow/issues/29960#issuecomment-1467700942

   @o-nikolas Yeah i can definitely work on it and submit a PR this week!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org