You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/29 16:12:07 UTC

[GitHub] [airflow] NBardelot opened a new issue #15069: Filesystem sensor, glob, and remote FS (like with SFTPHook)

NBardelot opened a new issue #15069:
URL: https://github.com/apache/airflow/issues/15069


   Hi, I cannot currently reproduce this as a bug but I'm very confident some people will stumble on the issue in the long run (a colleague of mine just did but using Airflow 1.10, though I think the issue is still the same in Airflow 2). So I open this issue in order to at least document the subject.
   
   The filesystem sensor uses a glob behaviour, since this PR : https://github.com/apache/airflow/pull/5358
   
   Yet, this sensor can be used without distinction with hooks that refer to a remote FS. Glob does not handle that.
   
   On the one hand the Python documentation states that glob() uses a mix of os.scandir() and fnmatch.fnmatch() which make the code only adapted to a local FS. On the other hand Airflow provides hooks like the SFTPHook which manage a remote FS (not available to "os"), and those hooks are eligible to the sensor via inheritance.
   
   Thus, trying to use a path with a glob pattern and a hook to a remote FS should end in an inconsistent behaviour:
   - either you're lucky and the glob() will not find the equivalent path locally and just return that the path does not exist (the sensor will never trigger);
   - or in a worse case scenario you might trigger the sensor for a file that exists locally but not on the remote FS as expected (a false trigger).
   
   In my opinion this should be fixed by two means: 
   
   1. the compatibility should be made available as a function of the hook (hook.hasGlobbing() -> true/false ; false by default) to manage the sensor's behaviour
   2. by improving the sensor's behaviour to avoid using globs (which is not 100% portable), by allowing things like startsWith or endsWith path search (implemented by a directory listing + lookup, which would be the portable way to do things) 
   
   BR. And thanks for existing code, bug or not :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org