You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/05 08:38:40 UTC

[GitHub] [airflow] saveriogzz opened a new pull request #17436: Support fnmatch pattern in SFTP Hook and Sensor

saveriogzz opened a new pull request #17436:
URL: https://github.com/apache/airflow/pull/17436


   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   Effort to give users the possibility to just operates on specific files using the SFTP provider. Please refer to the old PR below for details.
   
   related
   - issue: #15332
   - PR by @blcksrx : #15409
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] saveriogzz closed pull request #17436: Support fnmatch pattern in SFTP Hook and Sensor

Posted by GitBox <gi...@apache.org>.
saveriogzz closed pull request #17436:
URL: https://github.com/apache/airflow/pull/17436


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #17436: Support fnmatch pattern in SFTP Hook and Sensor

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #17436:
URL: https://github.com/apache/airflow/pull/17436#issuecomment-893276480


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, itโ€™s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better ๐Ÿš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #17436: Support fnmatch pattern in SFTP Hook and Sensor

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #17436:
URL: https://github.com/apache/airflow/pull/17436#issuecomment-893312844


   I think my https://github.com/apache/airflow/pull/15409#discussion_r616309811 still applies. `fnmatch` is indeed a better solution, but we need to design the functions better. Mayne leave `retrieve_file` along and implement a new `retrieve_files` instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr edited a comment on pull request #17436: Support fnmatch pattern in SFTP Hook and Sensor

Posted by GitBox <gi...@apache.org>.
uranusjr edited a comment on pull request #17436:
URL: https://github.com/apache/airflow/pull/17436#issuecomment-893312844






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr edited a comment on pull request #17436: Support fnmatch pattern in SFTP Hook and Sensor

Posted by GitBox <gi...@apache.org>.
uranusjr edited a comment on pull request #17436:
URL: https://github.com/apache/airflow/pull/17436#issuecomment-893312844






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #17436: Support fnmatch pattern in SFTP Hook and Sensor

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #17436:
URL: https://github.com/apache/airflow/pull/17436#issuecomment-893312844


   I think my https://github.com/apache/airflow/pull/15409#discussion_r616309811 still applies. `fnmatch` is indeed a better solution, but we need to design the functions better. Mayne leave `retrieve_file` along and implement a new `retrieve_files` instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] saveriogzz commented on a change in pull request #17436: Support fnmatch pattern in SFTP Hook and Sensor

Posted by GitBox <gi...@apache.org>.
saveriogzz commented on a change in pull request #17436:
URL: https://github.com/apache/airflow/pull/17436#discussion_r683254408



##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -212,7 +218,9 @@ def delete_directory(self, path: str) -> None:
         conn = self.get_conn()
         conn.rmdir(path)
 
-    def retrieve_file(self, remote_full_path: str, local_full_path: str) -> None:
+    def retrieve_file(
+        self, remote_full_path: str, local_full_path: str, fnmatch_pattern: Optional[str] = None
+    ) -> None:

Review comment:
       @uranusjr, in #15409, already mentioned that this implementation could cause headaches to users inasmuch "it subtly changes semantics depending on the arguments". We can start from here and think about a better solution.

##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -250,14 +265,22 @@ def delete_file(self, path: str) -> None:
         conn = self.get_conn()
         conn.remove(path)
 
-    def get_mod_time(self, path: str) -> str:
+    def get_mod_time(self, path: str, fnmatch_pattern: Optional[str] = None) -> str:
         """
         Returns modification time.
 
         :param path: full path to the remote file
         :type path: str
+        :param fnmatch_pattern: optional pattern to filter the remote_full_path files
+        :type: fnmatch_pattern: Optional[str]
         """
         conn = self.get_conn()
+        if fnmatch_pattern:
+            for file in conn.listdir(path):
+                if fnmatch(file, fnmatch_pattern):
+                    path = file
+                    break
+

Review comment:
       @uranusjr , how about this method called by the [SFTPSensor](https://github.com/apache/airflow/blob/main/airflow/providers/sftp/sensors/sftp.py#L49)?
   Should we create a different method to be able to catch all the files matching the pattern or we just wait for the the first of the list as it is implemented now?

##########
File path: airflow/providers/sftp/sensors/sftp.py
##########
@@ -57,13 +57,23 @@ def __init__(
 
     def poke(self, context: dict) -> bool:
         self.hook = SFTPHook(self.sftp_conn_id)
-        self.log.info('Poking for %s', self.path)
-        try:
-            mod_time = self.hook.get_mod_time(self.path, self.fnmatch_pattern)
-            self.log.info('Found File %s last modified: %s', str(self.path), str(mod_time))
-        except OSError as e:
-            if e.errno != SFTP_NO_SUCH_FILE:
-                raise e
-            return False
+        if not self.fnmatch_pattern:
+            self.log.info('Poking for %s', self.path)
+            try:
+                mod_time = self.hook.get_mod_time(self.path)
+                self.log.info('Found File %s last modified: %s', str(self.path), str(mod_time))
+            except OSError as e:
+                if e.errno != SFTP_NO_SUCH_FILE:
+                    raise e
+                return False
+        else:
+            self.log.info('Poking for files matching pattern %s into %s', self.fnmatch_pattern, self.path)
+            try:
+                mod_time = self.hook.get_mod_time_pattern(self.path, self.fnmatch_pattern)
+                self.log.info('Found Files last modified %s', str(mod_time))
+            except OSError as e:
+                if e.errno != SFTP_NO_SUCH_FILE:
+                    raise e
+                return False

Review comment:
       here we have two options:
   - not having specified `fnmatch_pattern`
   - having specified it
   
   In the first case, the parameter `path` is as usual the full path to the file.
   In the second case, the `path` will be the path to the directory whilst `fnmatch_pattern` will be useful to catch all the files we are interested in.
   In this second case, the new method [get_mod_time_pattern](https://github.com/saveriogzz/airflow/blob/AIRFLOW-15332/airflow/providers/sftp/hooks/sftp.py#L311), which returns a dictionaries with full paths to files as keys and their modification time as values.
   

##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -265,25 +284,51 @@ def delete_file(self, path: str) -> None:
         conn = self.get_conn()
         conn.remove(path)
 
-    def get_mod_time(self, path: str, fnmatch_pattern: Optional[str] = None) -> str:
+    def delete_file_pattern(self, dir_path: str, fnmatch_pattern: str) -> None:
         """
-        Returns modification time.
+        Removes files matching the pattern on the FTP Server
+
+        :param dir_path: full path to the remote directory
+        :type dir_path: str
+        :param fnmatch_pattern: pattern to filter the dir_path files
+        :type fnmatch_pattern: str
+        """
+        conn = self.get_conn()
+        for file in conn.listdir(dir_path):

Review comment:
       this is wrong because it should raise errors in case no matches are found, at the moment it's not




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] saveriogzz commented on pull request #17436: Support fnmatch pattern in SFTP Hook and Sensor

Posted by GitBox <gi...@apache.org>.
saveriogzz commented on pull request #17436:
URL: https://github.com/apache/airflow/pull/17436#issuecomment-895115486


   I'm going ahead closing this because I realized that the contribution is far from being ready to merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #17436: Support fnmatch pattern in SFTP Hook and Sensor

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #17436:
URL: https://github.com/apache/airflow/pull/17436#issuecomment-895098128


   It seems like your rebase went wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #17436: Support fnmatch pattern in SFTP Hook and Sensor

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #17436:
URL: https://github.com/apache/airflow/pull/17436#issuecomment-893276480


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, itโ€™s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better ๐Ÿš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org