You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/09/07 08:12:40 UTC

[GitHub] [druid] gianm commented on pull request #13027: Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects

gianm commented on PR #13027:
URL: https://github.com/apache/druid/pull/13027#issuecomment-1239058349

   Hmm. IMO, we should definitely change something, since the behavior of `FilenameUtils.wildcardMatch` is just really weird. For example, this returns `true`:
   
   ```
   FilenameUtils.wildcardMatch("a/b/c.txt", "a*.txt")
   ```
   
   Which is weird since no real shell works this way. Generally it is expected that `*` does not match `/`.
   
   This patch fixes the weirdness to a globbing implementation where `*` properly doesn't match `/`, and changing the examples to use `**.suffix` (which _does_ match `/` in normal shells) instead of `*.suffix`.
   
   However, there is another way to fix it that IMO is better. We could change the `filter` glob to match file _names_ rather than _paths_. Then, `*.suffix` would still work fine. It's closer to what the `local` input source does. It's also close to what the `find` Unix command does when you do `find [directory] -name [glob]`. (It searches in directory for files whose names match the provided glob.)
   
   I like this way better because it avoids the awkward `**` construction in the examples, and avoids the need for people to think about entire paths in their minds: they can simply think about the file names. (One reason to avoid working with entire paths is that gets weird with cloud files. Like, in `s3://a/b`, will the path-glob be applied to `s3://a/b`, or `/a/b/`, or `/b`, or `b`? Better to dodge the question entirely by using names, i.e. apply it to `b` alone.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org