You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/09/09 02:05:44 UTC

[GitHub] [airflow] potiuk commented on pull request #25980: Deprecate AWS S3 Connection Type

potiuk commented on PR #25980:
URL: https://github.com/apache/airflow/pull/25980#issuecomment-1241415888

   I'd remove it. It might break someone's workflow (https://xkcd.com/1172/) but I tihnk we have not promised any "special" behaviour here. 
   
   I can imagine users who have S3 connections defined and use get_hook() to instantate the hook dynamically.  This has been used for example in GenerictTransfer indeed but also I've seen it used in custom operators to depend on the conn_type to return particular Hook. We use it heavily in all SQL operators. BUT SQL operators have DBApiHook as base and follow the same protocols (and we unified them even more now via common.sql provider). 
   
   S3 Has no such promisses, no common interface, if there is any similarity with similar (GCS/Azure storage) hooks, this is at most incidental and I think this is about time that we don't care about those incidental similarities. Not having common "protocol" makes it next to impossible to use the hooks "dynamically" created by get_hook() depending on the connection type (though I Imagine someone doing '(S3Hook) get_hook("s3_conn")` rather `than S3Hook(conn_id=s3_conn)`. We would break the former workflow if we remove S3 conn_type - but as long as we  describe it in changelog (including how ot conver it) and bump major version of Amazon provider, I have completely no problem with it.
   
   Even the GenericTransfer @dstandish  mentioned is Generic-ish, not truly Generic. It assumes that source Hook has "get_records" method and target hook has "run" (if preoperator is set) and "insert_rows' method. And when you look at the implementations - only SQL hooks follow it, so this really "Generic SQL transfer" operator. 
   
   I personally think "get_hook()" method only makes sense in SQL in our case and it useless elsewhere (though some individual workflows might use it and custom Hooks can take advantage of it).
   
   I think personally that unless we have a well defined protocol to follow, any "generic" use of conn_type is next to useless - and in S3 case, I woudl not worry about deleting it.
   
   However there is one thing that I think SHOULD be handled - rather than deprecatng it, I'd simply convert it dynamically to AWS automatically. currently, when someone has it defined and opens it in UI the configuration (keys etc)  will be editable and saved. If we remove conn_type, the first time it is opened in the UI, I believe it will be converted dynamically to a "default" connection (not sure) and some information might be lost. So I would try to handle the scenario when somoene has S3 connection defined in the DB, opens it and it gets automatically converted to AWS one - not sure how to do it best though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org