You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/12 17:01:31 UTC
[GitHub] [airflow] saveriogzz opened a new issue #15332: SmartSftpSensor leveraging RegEx or fnmatch to look for patterns
saveriogzz opened a new issue #15332:
URL: https://github.com/apache/airflow/issues/15332
**Description**
SmartSftpSensor with possibility to search for patterns (RegEx or UNIX fnmatch) in filenames or folders
**Use case / motivation**
I would like to have the possibility to use wildcards and/or regular expressions to look for certain files when using an SftpSensor.
At the moment I tried to do something like this:
```python
from airflow.providers.sftp.sensors.sftp import SFTPSensor
from airflow.plugins_manager import AirflowPlugin
from airflow.utils.decorators import apply_defaults
from typing import Any
import os
import fnmatch
class SmartSftpSensor(SFTPSensor):
poke_context_fields = ('path', 'filepattern', 'sftp_conn_id', ) # <- Required fields
template_fields = ['filepattern', 'path']
@apply_defaults
def __init__(
self,
filepattern="",
**kwargs: Any):
super().__init__(**kwargs)
self.filepath = self.path
self.filepattern = filepattern
def poke(self, context):
full_path = self.filepath
directory = os.listdir(full_path)
for file in directory:
if not fnmatch.fnmatch(file, self.filepattern):
pass
else:
context['task_instance'].xcom_push(key='file_name', value=file)
return True
return False
def is_smart_sensor_compatible(self): # <- Required
result = (
not self.soft_fail
and super().is_smart_sensor_compatible()
)
return result
class MyPlugin(AirflowPlugin):
name = "my_plugin"
operators = [SmartSftpSensor]
```
And I call it by doing
```python
sense_file = SmartSftpSensor(
task_id='sense_file',
sftp_conn_id='my_sftp_connection',
path=templ_remote_filepath,
filepattern=filename,
timeout=3
)
```
where path is the folder containing the files and filepattern is a rendered filename with wildcards: `filename = """{{ execution_date.strftime("%y%m%d_%H00??_P??_???") }}.LV1"""`, which is rendered to e.g. `210412_1600??_P??_???.LV1`
but I am still not getting the expected result, as it's not capturing anything.
**Are you willing to submit a PR?**
Yes!
**Related Issues**
I didn't find any
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] blcksrx commented on issue #15332: SftpSensor w/ possibility to use RegEx or fnmatch
Posted by GitBox <gi...@apache.org>.
blcksrx commented on issue #15332:
URL: https://github.com/apache/airflow/issues/15332#issuecomment-818948301
You can use the wildcard in SFTPSensor
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] blcksrx commented on issue #15332: SftpSensor w/ possibility to use RegEx or fnmatch
Posted by GitBox <gi...@apache.org>.
blcksrx commented on issue #15332:
URL: https://github.com/apache/airflow/issues/15332#issuecomment-821319755
it sounds for *nix OS that provides shell. it's convenient to use wildcards like this:
```
hook.get_conn().execute("ls PATH/*.csv")
```
but it is too raw and not useable for any cases. I'm going to prepare a PR for that to using regex.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] saveriogzz commented on issue #15332: SftpSensor w/ possibility to use RegEx or fnmatch
Posted by GitBox <gi...@apache.org>.
saveriogzz commented on issue #15332:
URL: https://github.com/apache/airflow/issues/15332#issuecomment-819402752
Hey @blcksrx would you mind giving some more details on how to use them? If I simply use the wildcard written above with Airflow's built-in SFTPSensor, it doesn't capture anything..
Thanks in advance!!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org