You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/08 18:44:38 UTC

[GitHub] [airflow] hx-markterry opened a new pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

hx-markterry opened a new pull request #16338:
URL: https://github.com/apache/airflow/pull/16338


   Related: https://github.com/apache/airflow/issues/16323
   
   The SFTP hook does not allow for a SSH private key string to be used, only a private key file path. As it is based on the SSH hook, which does allow for a private key string to be used, a small change could align these two hooks.
   
   This change first checks if a paramiko object is present after the SSH hook has been initialised, and the prefers that rather than the ssh key file path.
   
   This is more a working idea than a finished PR, If this approach is deemed acceptable I'll be happy to turn this into a fully formed code change.
   
   Thanks in advance.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#discussion_r659676604



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -135,8 +135,13 @@ def get_conn(self) -> pysftp.Connection:
             }
             if self.password and self.password.strip():
                 conn_params['password'] = self.password
-            if self.key_file:
+
+            # Try to use the paramiko key from the SSH hook

Review comment:
       Would you please rebase to latest main @hx-markterry and add the doc explanation ? Also a unit test covering this case is needed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] 30blay commented on a change in pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
30blay commented on a change in pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#discussion_r734846701



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -135,8 +135,13 @@ def get_conn(self) -> pysftp.Connection:
             }
             if self.password and self.password.strip():
                 conn_params['password'] = self.password
-            if self.key_file:
+
+            # Try to use the paramiko key from the SSH hook

Review comment:
       You're right, passing a private key file path as `private_key` does not work for SFTPHook, despite what the doc says. I will update the doc and implement your solution




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lidalei commented on a change in pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
lidalei commented on a change in pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#discussion_r732790911



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -135,8 +135,13 @@ def get_conn(self) -> pysftp.Connection:
             }
             if self.password and self.password.strip():
                 conn_params['password'] = self.password
-            if self.key_file:
+
+            # Try to use the paramiko key from the SSH hook

Review comment:
       Forgot to mention that, it didn't work for us.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] hx-markterry commented on a change in pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
hx-markterry commented on a change in pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#discussion_r673915248



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -135,8 +135,13 @@ def get_conn(self) -> pysftp.Connection:
             }
             if self.password and self.password.strip():
                 conn_params['password'] = self.password
-            if self.key_file:
+
+            # Try to use the paramiko key from the SSH hook

Review comment:
       ok will do




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#issuecomment-929711243


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] 30blay commented on a change in pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
30blay commented on a change in pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#discussion_r734879490



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -135,8 +135,13 @@ def get_conn(self) -> pysftp.Connection:
             }
             if self.password and self.password.strip():
                 conn_params['password'] = self.password
-            if self.key_file:
+
+            # Try to use the paramiko key from the SSH hook

Review comment:
       Done! In https://github.com/apache/airflow/pull/18988




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lidalei commented on a change in pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
lidalei commented on a change in pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#discussion_r649058789



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -135,8 +135,13 @@ def get_conn(self) -> pysftp.Connection:
             }
             if self.password and self.password.strip():
                 conn_params['password'] = self.password
-            if self.key_file:
+
+            # Try to use the paramiko key from the SSH hook

Review comment:
       suggestion: Better to remove line 107 and 108
   ```
                   if 'private_key' in extra_options:
                       self.key_file = extra_options.get('private_key')
   ```
   and update doc for SFTPHook.
   
   Note pysftp only supports RSA and DSS keys while SSHHook also supports ECDSA and Ed25519 keys. Worth to note in doc.
   
   Ref: https://bitbucket.org/dundeemt/pysftp/src/1c0791759688a733a558b1a25d9ae04f52cf6a64/pysftp/__init__.py#lines-165:171




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#issuecomment-929711243


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lidalei commented on a change in pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
lidalei commented on a change in pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#discussion_r732789585



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -135,8 +135,13 @@ def get_conn(self) -> pysftp.Connection:
             }
             if self.password and self.password.strip():
                 conn_params['password'] = self.password
-            if self.key_file:
+
+            # Try to use the paramiko key from the SSH hook

Review comment:
       @30blay Thx! I don't think private_key works for SFTPHook. SFTPHook inherits SSHHook. If we have an SFTP connection with `private_key` in extra fields, it will fail to initialize an SFTPHook.
   
   https://github.com/apache/airflow/blob/main/airflow/providers/sftp/hooks/sftp.py#L69
   https://github.com/apache/airflow/blob/main/airflow/providers/ssh/hooks/ssh.py#L155-L158




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lidalei commented on a change in pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
lidalei commented on a change in pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#discussion_r649058789



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -135,8 +135,13 @@ def get_conn(self) -> pysftp.Connection:
             }
             if self.password and self.password.strip():
                 conn_params['password'] = self.password
-            if self.key_file:
+
+            # Try to use the paramiko key from the SSH hook

Review comment:
       suggestion: Better to remove line 107 and 108
   ```
                   if 'private_key' in extra_options:
                       self.key_file = extra_options.get('private_key')
   ```
   and update doc for SFTPHook.
   
   Note pysftp only supports RSA and DSS keys while SSHHook also supports ECDSA and Ed25519 keys. Worth to note in doc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#discussion_r688935347



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -135,8 +135,13 @@ def get_conn(self) -> pysftp.Connection:
             }
             if self.password and self.password.strip():
                 conn_params['password'] = self.password
-            if self.key_file:
+
+            # Try to use the paramiko key from the SSH hook

Review comment:
       SFTPHook has several methods that are sftp-specific (such as retrieve_file/store_file but also many others). Those make no sense for "ssh hook". 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#issuecomment-857007418


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, itโ€™s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better ๐Ÿš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#issuecomment-898899780


   @hx-markterry are you going to rebase that one :)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] 30blay commented on pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
30blay commented on pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#issuecomment-943492539


   @eladkal Thanks, done! https://github.com/apache/airflow/pull/18988


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aladinoss commented on a change in pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
aladinoss commented on a change in pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#discussion_r688912522



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -135,8 +135,13 @@ def get_conn(self) -> pysftp.Connection:
             }
             if self.password and self.password.strip():
                 conn_params['password'] = self.password
-            if self.key_file:
+
+            # Try to use the paramiko key from the SSH hook

Review comment:
       @potiuk & @hx-markterry what is the benefit of using separate hook for sftp sensor and operator ?
   Is it better to change the documentation and use ssh hook for both sftp and ssh ?
   example:
   
   ```
   import logging
   from datetime import datetime
   from paramiko import SFTP_NO_SUCH_FILE
   from airflow.contrib.hooks.ssh_hook import SSHHook
   from airflow.operators.sensors import BaseSensorOperator
   from airflow.utils.decorators import apply_defaults
   from airflow.plugins_manager import AirflowPlugin
   
   log = logging.getLogger(__name__)
   
   
   class SgSFTPSensor(BaseSensorOperator):
       """
       Waits for a file or directory to be present on SFTP.
   
       :param path: Remote file or directory path
       :type path: str
       :param ssh_conn_id: The connection to run the sensor against
       :type shh_conn_id: str
       """
       template_fields = ('path',)
   
       @apply_defaults
       def __init__(self, path, shh_conn_id='ssh_default', *args, **kwargs):
           super(SgSFTPSensor, self).__init__(*args, **kwargs)
           self.path = path
           self.hook = None
           self.shh_conn_id = shh_conn_id
   
       def get_mod_time(self, path):
           ssh_hook = SSHHook(ssh_conn_id=self.shh_conn_id)
           conn = ssh_hook.get_conn().open_sftp()
           ftp_mdtm = conn.stat(path).st_mtime
           return datetime.fromtimestamp(ftp_mdtm).strftime('%Y%m%d%H%M%S')
   
       def poke(self, context):
           self.log.info('Poking for %s', self.path)
           try:
               self.get_mod_time(self.path)
           except IOError as e:
               log.info(SFTP_NO_SUCH_FILE)
               if e.errno != SFTP_NO_SUCH_FILE:
                   raise e
               log.info(f"{self.path} NOT FOUND, sensor will retry!")
               return False
           log.info(f"{self.path} FOUND, sensor will exit now.")
           return True
   
   
   class SgSFTPPlugin(AirflowPlugin):
       name = "sgsftpplugin"
       sensors = [
           SgSFTPSensor
       ]
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#issuecomment-943478497


   @30blay since the PR is stale you can take over and open a new PR.
   You can preserve the original commits so it will be co-authored by the original contributor and you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aladinoss commented on a change in pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
aladinoss commented on a change in pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#discussion_r688912522



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -135,8 +135,13 @@ def get_conn(self) -> pysftp.Connection:
             }
             if self.password and self.password.strip():
                 conn_params['password'] = self.password
-            if self.key_file:
+
+            # Try to use the paramiko key from the SSH hook

Review comment:
       @potiuk & @hx-markterry what is the benefit of using separate hook for sftp sensor and operator ?
   Is it better to change the documentation and use ssh hook for both sftp and ssh ?
   example:
   
   '''
   import logging
   from datetime import datetime
   from paramiko import SFTP_NO_SUCH_FILE
   from airflow.contrib.hooks.ssh_hook import SSHHook
   from airflow.operators.sensors import BaseSensorOperator
   from airflow.utils.decorators import apply_defaults
   from airflow.plugins_manager import AirflowPlugin
   
   log = logging.getLogger(__name__)
   
   
   class SgSFTPSensor(BaseSensorOperator):
       """
       Waits for a file or directory to be present on SFTP.
   
       :param path: Remote file or directory path
       :type path: str
       :param ssh_conn_id: The connection to run the sensor against
       :type shh_conn_id: str
       """
       template_fields = ('path',)
   
       @apply_defaults
       def __init__(self, path, shh_conn_id='ssh_default', *args, **kwargs):
           super(SgSFTPSensor, self).__init__(*args, **kwargs)
           self.path = path
           self.hook = None
           self.shh_conn_id = shh_conn_id
   
       def get_mod_time(self, path):
           ssh_hook = SSHHook(ssh_conn_id=self.shh_conn_id)
           conn = ssh_hook.get_conn().open_sftp()
           ftp_mdtm = conn.stat(path).st_mtime
           return datetime.fromtimestamp(ftp_mdtm).strftime('%Y%m%d%H%M%S')
   
       def poke(self, context):
           self.log.info('Poking for %s', self.path)
           try:
               self.get_mod_time(self.path)
           except IOError as e:
               log.info(SFTP_NO_SUCH_FILE)
               if e.errno != SFTP_NO_SUCH_FILE:
                   raise e
               log.info(f"{self.path} NOT FOUND, sensor will retry!")
               return False
           log.info(f"{self.path} FOUND, sensor will exit now.")
           return True
   
   
   class SgSFTPPlugin(AirflowPlugin):
       name = "sgsftpplugin"
       sensors = [
           SgSFTPSensor
       ]
   '''




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] 30blay commented on pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
30blay commented on pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#issuecomment-943440885


   @hx-markterry I made this [PR](https://github.com/hx-markterry/airflow/pull/1) on your fork a week ago, to move this forward. Have you seen it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #16338:
URL: https://github.com/apache/airflow/pull/16338


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] 30blay commented on a change in pull request #16338: SFTP hook to prefer the SSH paramiko key over the key file path

Posted by GitBox <gi...@apache.org>.
30blay commented on a change in pull request #16338:
URL: https://github.com/apache/airflow/pull/16338#discussion_r732756242



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -135,8 +135,13 @@ def get_conn(self) -> pysftp.Connection:
             }
             if self.password and self.password.strip():
                 conn_params['password'] = self.password
-            if self.key_file:
+
+            # Try to use the paramiko key from the SSH hook

Review comment:
       @lidalei I like your suggestion because it makes the definition of private_key consistent between SFTPHook and SSHHook.
   
   But it would be a breaking change for clients who currently pass a file path to SFTPHook's private_key. How should I go about that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org