You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/06/16 11:42:40 UTC

[GitHub] [airflow] potiuk edited a comment on pull request #8974: Add VaultHook to hashicorp provider

potiuk edited a comment on pull request #8974:
URL: https://github.com/apache/airflow/pull/8974#issuecomment-644710427


   > One for the VaultHook, and one for the extra auth types -- if a user is wanting more Vault auth types and is looking through the changelog they couldn't know it's been added.
   
   That's fine - I can split it, but I would like to have clarity on the architecture so I do not have to again refactor all the tests. 
   
   @ashb -> Can you please take a look at the way it is implemented now? It is also super important to figure out if we are all OK with this approach as I already started doing that for other Secret backends.
   
   For now, I moved the client to `hooks/vault.py` but I still think it makes sense to keep it as a separate entity without dependency on whether we parse authentication information from the connection or from the Backend dictionary stored in the configuration.
   
   Let me explain my thinking behind the architecture and maybe you will see why I have done it this way.
   
   Specifically, - those are taken from connection always (and you should not pass them IMHO via hook initialization parameters). Those are the "basic" parameters that Connection provides always (and main reason why Connection is encrypted with Fernet Key - to keep them secret):
   
   ```
               username=self.connection.login,
               password=self.connection.password,
               key_id=self.connection.login,
               secret_id=self.connection.password,
   ```
   
   I believe this is what the `Hook` meaning is in Airflow - it should take the connection and read all the "basic" information from the connection. Then maybe reads some extras (also from connection) - those being overridable via initialization parameters. 
   
   This is is how I see a Hook "pattern" in Airflow. Hooks can use other libraries to make the calls. I see the "client" as internal library that hides the complexity of the actual authentication method to use based on parameters passed.
   
   I see a few options of using Backend <> Hook in this scenario:
   
   1)  we build an artificial connection and pass it to hook (very bad idea IMHO) 
   
   2) Get Secret to use Hook directly. For that we will have to change the Hook to accept the basic parameters (host/port etc.) not only from connection but also if there is a lack of connection it should have additional parameters for host, port, user, passwords (or maybe just url)  from the parameters passed. I believe this is what you propose @ashb ? Correct?
   
   3) Implement intermediary used by both Hook and Secret - (Client in my case) which is library used by both - Hook and Backend. Client has no relation to Connection - it's Hook's job to provide all the parameters to the Client parsed from the connection.
   
   
   For 2), I think exposing password/URL/login via Hook constructor is a bad idea. The hook should encourage connection as the only source of the connectivity information - because it is encrypted and shared with other workers by design. Exposing host/url/ etc. and allowing connection to be None is a bad idea for Hook IMHO. When you look at BaseHook, the only "instance" method in this is what you see -> clearly you expect some relation with the Connection object:
   
   ```
       @classmethod
       def get_connections(cls, conn_id: str) -> List[Connection]:
   
       @classmethod
       def get_connection(cls, conn_id: str) -> Connection:
   
       @classmethod
       def get_hook(cls, conn_id: str) -> "BaseHook":
   
       def get_conn(self):
   ```
   
   I'd love to understand what's really wrong with approach 2) in this context.
   
   Is the problem with documentation duplication?  I think the separation of concerns is fairly solid here. 
    One thing that I might consider to remove some of the documentation duplication (which is clearly there) to make a Named Tuple to keep all the Vault parameters and use it in Client and Hook.
   
   Please let me know what you think @kaxil - I'd love your opinion as well before I move forward with it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org