You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "romibuzi (via GitHub)" <gi...@apache.org> on 2023/02/08 11:12:26 UTC
[GitHub] [airflow] romibuzi commented on issue #29423: GlueJobOperator throws error after migration to newest version of Airflow
romibuzi commented on issue #29423:
URL: https://github.com/apache/airflow/issues/29423#issuecomment-1422427027
Hi @vgutkovsk!
Oh damn indeed I realize introduced a breaking change. Before the check `if self.s3_bucket is None` was done only when the operator was creating the job. Now it is done at the start of `create_glue_job_config()` method here: https://github.com/apache/airflow/blob/44024564cb3dd6835b0375d61e682efc1acd7d2c/airflow/providers/amazon/aws/hooks/glue.py#L103-L104
And this method is called in any cases here: https://github.com/apache/airflow/blob/44024564cb3dd6835b0375d61e682efc1acd7d2c/airflow/providers/amazon/aws/hooks/glue.py#L328
I realize `s3_bucket` is only used to determine `s3_log_path`: https://github.com/apache/airflow/blob/44024564cb3dd6835b0375d61e682efc1acd7d2c/airflow/providers/amazon/aws/hooks/glue.py#L112
`script_location` on the other hand can be None and is not concatenated with `s3_bucket` at all.
Maybe the best way to handle the problem would be to remove this check on s3_bucket, and if it is None then omit the the parameter `"LogUri"` which makes usage of `s3_log_path` as it is not a mandatory parameter for a glue job: https://github.com/apache/airflow/blob/44024564cb3dd6835b0375d61e682efc1acd7d2c/airflow/providers/amazon/aws/hooks/glue.py#L118
cc @Taragolis
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org