You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/07/19 06:47:16 UTC

[GitHub] [airflow] baolsen commented on issue #10874: SSHHook get_conn() does not re-use client

baolsen commented on issue #10874:
URL: https://github.com/apache/airflow/issues/10874#issuecomment-882286124


   Thank you all for the feedback.
   
   My use case is to split and zip some files on a remote server if those files are above a specific size.
   
   To do this, I do the following over SSH:
   - list the input files and check their size 
   - zip and split the largest ones using a Python script I send over SSH
   - Check the output by iterating the output files
   
   I put them in a single task using CustomSSHOperator which inherits SSHOperator. I want my task to be:
   
   - Able to resume if the process fails on file N
   - Simplify the support process so we only re-run 1 task if it fails
   - Avoid performance issues with 1.10.x Airflow scheduler if we had many tasks
   
   In my environment connections are expensive (for local and remote), complex and error prone.
   This is mainly because of the corporate authentication implementation at my site.
   
   But SSHOperator does not allow multiple commands to be run over 1 connection even when subclassed.
   I had to find a way to run multiple commands over the same connection.
   
   I could have subclassed SSHOperator and copied all of the "boiler plate" client code.
   That code is complex and doesn't directly add value to my use case. I don't want to maintain and update it myself.
   
   So I ended up for each command, changing the self.command and calling super.execute() as it was simplest.
   I agree it is not the right way to solve this just because it was convenient for me ;)
   
   One option could be to refactor SSHOperator to isolate the "run an SSH command" part which can be reused by subclasses.
   
   But we seem not decided yet whether the hook is a better place to do this. If so I think the changes could be more extensive and impactful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org