You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/12 17:01:31 UTC

[GitHub] [airflow] saveriogzz opened a new issue #15332: SmartSftpSensor leveraging RegEx or fnmatch to look for patterns

saveriogzz opened a new issue #15332:
URL: https://github.com/apache/airflow/issues/15332


   **Description**
   
   SmartSftpSensor with possibility to search for patterns (RegEx or UNIX fnmatch) in filenames or folders
   
   **Use case / motivation**
   
   I would like to have the possibility to use wildcards and/or regular expressions to look for certain files when using an SftpSensor.  
   At the moment I tried to do something like this:
   
   ```python
   from airflow.providers.sftp.sensors.sftp import SFTPSensor
   from airflow.plugins_manager import AirflowPlugin
   from airflow.utils.decorators import apply_defaults
   from typing import Any
   
   import os
   import fnmatch
   
   class SmartSftpSensor(SFTPSensor):
       poke_context_fields = ('path', 'filepattern', 'sftp_conn_id', ) # <- Required fields
       template_fields = ['filepattern', 'path']
   
       @apply_defaults
       def __init__(
               self, 
               filepattern="",
               **kwargs: Any):
   
           super().__init__(**kwargs)
           self.filepath = self.path
           self.filepattern = filepattern
   
       def poke(self, context):
           full_path = self.filepath
   
           directory = os.listdir(full_path)
   
           for file in directory:
               if not fnmatch.fnmatch(file, self.filepattern):
                   pass
               else:
                   context['task_instance'].xcom_push(key='file_name', value=file)
                   return True
           return False
   
       def is_smart_sensor_compatible(self): # <- Required
           result = (
               not self.soft_fail
               and super().is_smart_sensor_compatible()
           )
           return result
   
   class MyPlugin(AirflowPlugin):
       name = "my_plugin"
       operators = [SmartSftpSensor]
   ```
   And I call it by doing
   
   ```python
   
   sense_file = SmartSftpSensor(
       task_id='sense_file',
       sftp_conn_id='my_sftp_connection',
       path=templ_remote_filepath,
       filepattern=filename,
       timeout=3
   )
   ```
   where path is the folder containing the files and filepattern is a rendered filename with wildcards: `filename = """{{ execution_date.strftime("%y%m%d_%H00??_P??_???") }}.LV1"""`, which is rendered to e.g. `210412_1600??_P??_???.LV1`
   
   but I am still not getting the expected result, as it's not capturing anything.
   
   **Are you willing to submit a PR?**
   Yes!
   
   **Related Issues**
   
   I didn't find any


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] blcksrx commented on issue #15332: SftpSensor w/ possibility to use RegEx or fnmatch

Posted by GitBox <gi...@apache.org>.
blcksrx commented on issue #15332:
URL: https://github.com/apache/airflow/issues/15332#issuecomment-818948301


   You can use the wildcard in SFTPSensor


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] blcksrx commented on issue #15332: SftpSensor w/ possibility to use RegEx or fnmatch

Posted by GitBox <gi...@apache.org>.
blcksrx commented on issue #15332:
URL: https://github.com/apache/airflow/issues/15332#issuecomment-821319755


   it sounds for *nix OS that provides shell. it's convenient to use wildcards like this:
   ```
   hook.get_conn().execute("ls PATH/*.csv")
   ```
   but it is too raw and not useable for any cases. I'm going to prepare a PR for that to using regex.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] saveriogzz commented on issue #15332: SftpSensor w/ possibility to use RegEx or fnmatch

Posted by GitBox <gi...@apache.org>.
saveriogzz commented on issue #15332:
URL: https://github.com/apache/airflow/issues/15332#issuecomment-819402752


   Hey @blcksrx would you mind giving some more details on how to use them? If I simply use the wildcard written above with Airflow's built-in SFTPSensor, it doesn't capture anything..
   Thanks in advance!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org