You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "romibuzi (via GitHub)" <gi...@apache.org> on 2023/02/08 11:12:26 UTC

[GitHub] [airflow] romibuzi commented on issue #29423: GlueJobOperator throws error after migration to newest version of Airflow

romibuzi commented on issue #29423:
URL: https://github.com/apache/airflow/issues/29423#issuecomment-1422427027

   Hi @vgutkovsk!
   
   Oh damn indeed I realize introduced a breaking change. Before the check `if self.s3_bucket is None` was done only when the operator was creating the job. Now it is done at the start of `create_glue_job_config()` method here: https://github.com/apache/airflow/blob/44024564cb3dd6835b0375d61e682efc1acd7d2c/airflow/providers/amazon/aws/hooks/glue.py#L103-L104
   
   And this method is called in any cases here: https://github.com/apache/airflow/blob/44024564cb3dd6835b0375d61e682efc1acd7d2c/airflow/providers/amazon/aws/hooks/glue.py#L328
   
   I realize `s3_bucket` is only used to determine `s3_log_path`: https://github.com/apache/airflow/blob/44024564cb3dd6835b0375d61e682efc1acd7d2c/airflow/providers/amazon/aws/hooks/glue.py#L112
   
   `script_location` on the other hand can be None and is not concatenated with `s3_bucket` at all. 
   
   Maybe the best way to handle the problem would be to remove this check on s3_bucket, and if it is None then omit the the parameter `"LogUri"` which makes usage of `s3_log_path` as it is not a mandatory parameter for a glue job: https://github.com/apache/airflow/blob/44024564cb3dd6835b0375d61e682efc1acd7d2c/airflow/providers/amazon/aws/hooks/glue.py#L118
   
   cc @Taragolis 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org