You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by qiao wen <31...@qq.com> on 2016/07/24 10:37:40 UTC

Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/
-----------------------------------------------------------

Review request for Flume.


Repository: flume-git


Description
-------

In our log management project, we wan't to track many log files like this:
/app/dir1/log.*
/app/dir2/log.*
...
/app/dirn/log.*
But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
a1.sources.r1.filegroups.fg = /app/*/log.*


Diffs
-----

  flume-ng-doc/sphinx/FlumeUserGuide.rst 1334500 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 

Diff: https://reviews.apache.org/r/50378/diff/


Testing
-------

All tests in TestTaildirSource passed.


Thanks,

qiao wen


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.

> On \u4e03\u6708 26, 2016, 3:44 p.m., Denes Arvay wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, line 264
> > <https://reviews.apache.org/r/50378/diff/2/?file=1451879#file1451879line264>
> >
> >     In general I think this approach is not the best, as it has to iterate over the `"*?[{"` string every time, whereas with a `Set` it could be faster. As the length of the pattern string is only 4, it mightn't be worth the change, mostly because of the char->Character autoboxing's overhead.
> >     But with a `BitSet` we could have a bit (no pun intended) better performace (although I don't have numbers):
> >     ```
> >       private static final BitSet GLOB_WILDCARD_BIT_SET = new BitSet(8);
> >       static {
> >         for (char ch : "*?[{".toCharArray()) {
> >           GLOB_WILDCARD_BIT_SET.set(ch);
> >         }
> >       }
> >     ```
> >     and then check with `GLOB_WILDCARD_BIT_SET.get(path.charAt(i))`
> >     
> >     But again, I just wanted to share my opinion with you, I'm definitely not insisting on this change.

Thanks. I checked Glob.java in package sun.nio.fs,  code is as follows:
private static boolean isRegexMeta(char var0) {
    return ".^$+{[]|()".indexOf(var0) != -1;
  }

  private static boolean isGlobMeta(char var0) {
    return "\*?[{".indexOf(var0) != -1;
  }


> On \u4e03\u6708 26, 2016, 3:44 p.m., Denes Arvay wrote:
> > flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java, lines 269-271
> > <https://reviews.apache.org/r/50378/diff/2/?file=1451880#file1451880line269>
> >
> >     are these `sleep`s really needed?

I delete this test.


On \u4e03\u6708 26, 2016, 3:44 p.m., qiao wen wrote:
> > Thanks for the patch. Besides the line-comments I've added there are a handful of minor formatting issues which make the checkstyle fail, please fix those too. You can run the checkstyle with `mvn clean verify -DskipTests -Drat.skip`. Let me know if you have any issues running it, I'm happy to help.

Thanks for your review. I haved runed checkstyle and modify the code.


- qiao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review143526
-----------------------------------------------------------


On \u4e03\u6708 25, 2016, 3:40 p.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated \u4e03\u6708 25, 2016, 3:40 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 1334500 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by Denes Arvay <de...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review143526
-----------------------------------------------------------




flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 123)
<https://reviews.apache.org/r/50378/#comment209341>

    could you please add to the documentation that if there is a wildcard in the directory name then the `cachePatternMatching` is forced to false?



flume-ng-doc/sphinx/FlumeUserGuide.rst (line 1123)
<https://reviews.apache.org/r/50378/#comment209343>

    A bit more explicit description would be better, e.g. "Glob pattern wildcards (see: -> link to https://docs.oracle.com/javase/7/docs/api/java/nio/file/FileSystem.html#getPathMatcher(java.lang.String) ) are allowed..."



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 223)
<https://reviews.apache.org/r/50378/#comment209362>

    it's not needed to implement this method, the super implementation is basically the same (does some not-null checks and returns `CONTINUE`)



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (lines 238 - 241)
<https://reviews.apache.org/r/50378/#comment209385>

    it would be good to log the exception instead of silently swallowing it. Something similar as in line 244



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (lines 250 - 270)
<https://reviews.apache.org/r/50378/#comment209365>

    Please write some test for this function. It seems to work as expected but I'd be more confident if there were some tests for the edge cases at least (e.g. expression character at the beginning, etc).
    And it would also make sure that things won't brake if this method changes.



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 251)
<https://reviews.apache.org/r/50378/#comment209368>

    The name of this method doesn't really describe what it does. (finding the first wildcard and trimming the path)
    something like `trimPathBeforeWildcards` would be better



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 255)
<https://reviews.apache.org/r/50378/#comment209347>

    In general I think this approach is not the best, as it has to iterate over the `"*?[{"` string every time, whereas with a `Set` it could be faster. As the length of the pattern string is only 4, it mightn't be worth the change, mostly because of the char->Character autoboxing's overhead.
    But with a `BitSet` we could have a bit (no pun intended) better performace (although I don't have numbers):
    ```
      private static final BitSet GLOB_WILDCARD_BIT_SET = new BitSet(8);
      static {
        for (char ch : "*?[{".toCharArray()) {
          GLOB_WILDCARD_BIT_SET.set(ch);
        }
      }
    ```
    and then check with `GLOB_WILDCARD_BIT_SET.get(path.charAt(i))`
    
    But again, I just wanted to share my opinion with you, I'm definitely not insisting on this change.



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 263)
<https://reviews.apache.org/r/50378/#comment209348>

    `== true` is not needed, the code is cleaner without it.



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java (line 147)
<https://reviews.apache.org/r/50378/#comment209374>

    could you please add some javadoc to explain the purpose of this test?



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java (line 148)
<https://reviews.apache.org/r/50378/#comment209350>

    could you please simplify this by extracting the repeated lines to a method?
    
    eg:
    `private File createTempFile(String path)`
    which could create the parent dir, the file itself and write the content. The content can be the absolute path for example, in that case we don't need an extra parameter for that, and the test could do `assertTrue(out.contains(f.getAbsolutePath()));`



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java (line 151)
<https://reviews.apache.org/r/50378/#comment209351>

    it's better to use the `File(File parent, String child)` constructor, e.g. `File f1 = new File(tmpDir, "fg1/dir1/subdir/file1.txt");` instead of string concatenation.



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java (line 241)
<https://reviews.apache.org/r/50378/#comment209376>

    could you please add some javadoc here too?



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java (lines 268 - 270)
<https://reviews.apache.org/r/50378/#comment209355>

    are these `sleep`s really needed?


Thanks for the patch. Besides the line-comments I've added there are a handful of minor formatting issues which make the checkstyle fail, please fix those too. You can run the checkstyle with `mvn clean verify -DskipTests -Drat.skip`. Let me know if you have any issues running it, I'm happy to help.

- Denes Arvay


On July 25, 2016, 3:40 p.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated July 25, 2016, 3:40 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 1334500 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by Denes Arvay <de...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review144071
-----------------------------------------------------------


Fix it, then Ship it!




3 minor comments otherwise LGTM


flume-ng-doc/sphinx/FlumeUserGuide.rst (line 1124)
<https://reviews.apache.org/r/50378/#comment210059>

    Please also add that if there is pattern in the directory name then the `cachePatternMatching` will be overwritten to `false`.



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 128)
<https://reviews.apache.org/r/50378/#comment210061>

    It'd be better to log this only if the "incoming" `cachePatternValue` is `true` to avoid confusion.



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 252)
<https://reviews.apache.org/r/50378/#comment210060>

    please add @VisibleForTesting annotation


- Denes Arvay


On July 28, 2016, 3:18 p.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated July 28, 2016, 3:18 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 1334500 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by Denes Arvay <de...@cloudera.com>.

> On July 29, 2016, 2:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, lines 241-251
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457767#file1457767line241>
> >
> >     Missing short circuit on surely not matching huge subdirs. You need to override the preVisitDirectory for that. (Also I would recommend overriding visitFileFailed for robustness)
> 
> qiao wen wrote:
>     I add overriding visitFileFailed. But I don't override preVisitDirectory. @Denes said "it's not needed to implement this method, the super implementation is basically the same (does some not-null checks and returns CONTINUE)".

Your original implementation of `preVisitDirectory` simply returned `CONTINUE`, which is the same as the super implementation. But I didn't take into account the possible optimization so I agree with Attila that it would be good to short circuit if possible.


- Denes


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review144092
-----------------------------------------------------------


On July 30, 2016, 5:33 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated July 30, 2016, 5:33 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.

> On \u4e03\u6708 29, 2016, 2:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
> > Line 225 (original), 226-228 (patched)
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457767#file1457767line240>
> >
> >     performance downgrade due to the idempotent instantiations of matchers
> 
> qiao wen wrote:
>     I agree with you. But is there any good idea?
> 
> eskrm wrote:
>     Extract the matchers out to the enclosing class and finalize?
> 
> Denes Arvay wrote:
>     I second this, extracting the matchers will save unnecessary instance creation and pattern parsing. It's possible to move both of the matchers to the `TailDirMatcher` class.
> 
> qiao wen wrote:
>     OK, thanks very much. I will fix that.

Please review the latest code.


- qiao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review144092
-----------------------------------------------------------


On \u4e09\u6708 13, 2017, 9 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated \u4e09\u6708 13, 2017, 9 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst d863068 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java 8838320 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java a107a01 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java f2347f3 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirEventReader.java bcfe4bb 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> 
> Diff: https://reviews.apache.org/r/50378/diff/6/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.

> On \u4e03\u6708 29, 2016, 2:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, lines 240-242
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457767#file1457767line240>
> >
> >     performance downgrade due to the idempotent instantiations of matchers
> 
> qiao wen wrote:
>     I agree with you. But is there any good idea?
> 
> eskrm wrote:
>     Extract the matchers out to the enclosing class and finalize?
> 
> Denes Arvay wrote:
>     I second this, extracting the matchers will save unnecessary instance creation and pattern parsing. It's possible to move both of the matchers to the `TailDirMatcher` class.

OK, thanks very much. I will fix that.


- qiao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review144092
-----------------------------------------------------------


On \u4e03\u6708 30, 2016, 5:33 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated \u4e03\u6708 30, 2016, 5:33 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by es...@protonmail.com.

> On July 29, 2016, 2:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, lines 240-242
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457767#file1457767line240>
> >
> >     performance downgrade due to the idempotent instantiations of matchers
> 
> qiao wen wrote:
>     I agree with you. But is there any good idea?

Extract the matchers out to the enclosing class and finalize?


- eskrm


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review144092
-----------------------------------------------------------


On July 30, 2016, 5:33 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated July 30, 2016, 5:33 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by Denes Arvay <de...@cloudera.com>.

> On July 29, 2016, 4:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, lines 240-242
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457767#file1457767line240>
> >
> >     performance downgrade due to the idempotent instantiations of matchers
> 
> qiao wen wrote:
>     I agree with you. But is there any good idea?
> 
> eskrm wrote:
>     Extract the matchers out to the enclosing class and finalize?

I second this, extracting the matchers will save unnecessary instance creation and pattern parsing. It's possible to move both of the matchers to the `TailDirMatcher` class.


> On July 29, 2016, 4:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, lines 241-251
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457767#file1457767line241>
> >
> >     Missing short circuit on surely not matching huge subdirs. You need to override the preVisitDirectory for that. (Also I would recommend overriding visitFileFailed for robustness)
> 
> qiao wen wrote:
>     I add overriding visitFileFailed. But I don't override preVisitDirectory. @Denes said "it's not needed to implement this method, the super implementation is basically the same (does some not-null checks and returns CONTINUE)".
> 
> Denes Arvay wrote:
>     Your original implementation of `preVisitDirectory` simply returned `CONTINUE`, which is the same as the super implementation. But I didn't take into account the possible optimization so I agree with Attila that it would be good to short circuit if possible.

It might be worth to think a bit more on this, as it seems to be possible to optimize it by implementing the `preVisitDirectory` and using the `dirMatcher` to check whether any of the children of the given directory might match. It may be needed to split the `dirMatcher` by the directory separator character to do this.


- Denes


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review144092
-----------------------------------------------------------


On July 30, 2016, 7:33 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated July 30, 2016, 7:33 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.

> On \u4e03\u6708 29, 2016, 2:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java, line 162
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457769#file1457769line162>
> >
> >     I think you just testing the filtering with this junit test. This can be done as well in the TestTaildirMatcher file without the need of creating a channel and transaction. Required complexity of such test would be closer to the unit idea.

I think tests in TestTaildirSource are used to test the config values, such as filename regex, header map usage and fileheader. Then it's ok to put this test in TestTaildirSource.


On \u4e03\u6708 29, 2016, 2:49 p.m., qiao wen wrote:
> > I think it is getting there to be committed. Some more iteration and it will be done. Keep up the good work!

Thanks for you review. It's a honor to contribute for flume.


- qiao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review144092
-----------------------------------------------------------


On \u4e03\u6708 29, 2016, 12:04 p.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated \u4e03\u6708 29, 2016, 12:04 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by es...@protonmail.com.

> On July 29, 2016, 2:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, line 239
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457767#file1457767line239>
> >
> >     Is this solution follow symlinks? If not then it would be a breaking change.
> 
> qiao wen wrote:
>     Sorry?I don't understand the meaning. Could you please explain it more clearly.

This version of Files.walkFileTree won't follow symlinks in its traversal. If you were to have a symlinked directory with logs inside then none of them would get picked up. To follow symlinks you need to use the other version of Files.walkFileTree that takes a set of options and pass in EnumSet.of(FileVisitOption.FOLLOW_LINKS).


- eskrm


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review144092
-----------------------------------------------------------


On July 30, 2016, 5:33 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated July 30, 2016, 5:33 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.

> On \u4e03\u6708 29, 2016, 2:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, lines 268-288
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457767#file1457767line268>
> >
> >     Usage of magic constants like '/', '\', "*?[{" most be avoided.

Thanks. I checked Glob.java in package sun.nio.fs,  code is as follows:
private static boolean isRegexMeta(char var0) {
    return ".^$+{[]|()".indexOf(var0) != -1;
  }

private static boolean isGlobMeta(char var0) {
    return "*?[{".indexOf(var0) != -1;
  }


> On \u4e03\u6708 29, 2016, 2:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, line 239
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457767#file1457767line239>
> >
> >     Is this solution follow symlinks? If not then it would be a breaking change.

Sorry?I don't understand the meaning. Could you please explain it more clearly.


> On \u4e03\u6708 29, 2016, 2:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, lines 240-242
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457767#file1457767line240>
> >
> >     performance downgrade due to the idempotent instantiations of matchers

I agree with you. But is there any good idea?


> On \u4e03\u6708 29, 2016, 2:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, line 274
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457767#file1457767line274>
> >
> >     could you please explain why the second half was needed? (i >= 1 && path.charAt(i - 1) != '\')

'\' means the escape character. For example, path = "/dir/subdira/\*/subdirb/*/subdirc" , then trimPathBeforeFirstWildcard(path) returns "/dir/subdira/\*/subdirb" , not "/dir/subdira"


- qiao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review144092
-----------------------------------------------------------


On \u4e03\u6708 29, 2016, 12:04 p.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated \u4e03\u6708 29, 2016, 12:04 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.

> On \u4e03\u6708 29, 2016, 2:49 p.m., Attila Simon wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java, lines 241-251
> > <https://reviews.apache.org/r/50378/diff/4/?file=1457767#file1457767line241>
> >
> >     Missing short circuit on surely not matching huge subdirs. You need to override the preVisitDirectory for that. (Also I would recommend overriding visitFileFailed for robustness)

I add overriding visitFileFailed. But I don't override preVisitDirectory. @Denes said "it's not needed to implement this method, the super implementation is basically the same (does some not-null checks and returns CONTINUE)".


- qiao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review144092
-----------------------------------------------------------


On \u4e03\u6708 29, 2016, 12:04 p.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated \u4e03\u6708 29, 2016, 12:04 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by Attila Simon <sa...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review144092
-----------------------------------------------------------




flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 150)
<https://reviews.apache.org/r/50378/#comment210092>

    terminology "weakly consistent" is not defined by javadocs of SimpleFileVisitor



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 209)
<https://reviews.apache.org/r/50378/#comment210093>

    terminology "weakly consistent" is not defined by javadocs of SimpleFileVisitor



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 212)
<https://reviews.apache.org/r/50378/#comment210094>

    terminology "weakly consistent" is not defined by javadocs of SimpleFileVisitor



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 220)
<https://reviews.apache.org/r/50378/#comment210095>

    should be linked https://docs.oracle.com/javase/8/docs/api/java/nio/file/FileVisitor.html instead.



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 225)
<https://reviews.apache.org/r/50378/#comment210097>

    Is this solution follow symlinks? If not then it would be a breaking change.



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (lines 226 - 228)
<https://reviews.apache.org/r/50378/#comment210110>

    performance downgrade due to the idempotent instantiations of matchers



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (lines 227 - 237)
<https://reviews.apache.org/r/50378/#comment210108>

    Missing short circuit on surely not matching huge subdirs. You need to override the preVisitDirectory for that. (Also I would recommend overriding visitFileFailed for robustness)



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (lines 254 - 274)
<https://reviews.apache.org/r/50378/#comment210087>

    Usage of magic constants like '/', '\', "*?[{" most be avoided.



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (lines 254 - 274)
<https://reviews.apache.org/r/50378/#comment210090>

    wouldn't this be easier like any of this below:
    
      String trimPathBeforeFirstWildcard2(String path) {
        final String WILDCARDS = "*?{[";
        String[] dirs = path.split(File.pathSeparator);
        List<String> result = new LinkedList<String>();
        for (int i = 0; i < dirs.length; ++i) {
          if (StringUtils.containsAny(dirs[i], WILDCARDS)) {
            result.add(dirs[i]);
          }
        }
        if (result.isEmpty()) {
          return File.separator;
        } else {
          return StringUtils.join(result, File.separator);
        }
      }
    
      String trimPathBeforeFirstWildcard3(String path) {
        final String WILDCARDS = "*?{[";
        int endDirIdx = 0;
        for(int i = 0; i < path.length(); ++i) {
          char at = path.charAt(i);
          if (at == File.separatorChar) {
            endDirIdx = i;
          } else if (StringUtils.contains(WILDCARDS, at)) {
            break;
          } else {
            continue;
          }
        }
        if(endDirIdx > 0){
          return path.substring(0, endDirIdx);
        }else {
          return File.separator;
        }
      }



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 260)
<https://reviews.apache.org/r/50378/#comment210088>

    could you please explain why the second half was needed? (i >= 1 && path.charAt(i - 1) != '\')



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java (lines 148 - 157)
<https://reviews.apache.org/r/50378/#comment210102>

    would you mind linking the syntax explanation instead of copying it here?



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java (line 161)
<https://reviews.apache.org/r/50378/#comment210103>

    I think you just testing the filtering with this junit test. This can be done as well in the TestTaildirMatcher file without the need of creating a channel and transaction. Required complexity of such test would be closer to the unit idea.


I think it is getting there to be committed. Some more iteration and it will be done. Keep up the good work!

- Attila Simon


On July 29, 2016, 12:04 p.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated July 29, 2016, 12:04 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.

> On \u4e09\u6708 3, 2017, 2:44 p.m., Denes Arvay wrote:
> > First of all thank you for your effort by providing and constantly improving this patch.
> > I have two comments, both of them were already mentioned before, I just would like to give them more emphasis:
> > - please think a little bit on the `walkFileTree` optimization by implementing the `preVisitDirectory` (I'm not 100% sure that it's feasible, though) and also please move the `dirMatcher` and `fileMatcher` to the enclosing class.
> > - Following the symlinks should be added as well to keep the backward compatibility. (It's still an open issue from the initial code review rounds)

OK, I will take into account the two issues you mentioned.


- qiao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review167826
-----------------------------------------------------------


On \u4e03\u6708 30, 2016, 5:33 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated \u4e03\u6708 30, 2016, 5:33 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> 
> Diff: https://reviews.apache.org/r/50378/diff/5/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by Denes Arvay <de...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review167826
-----------------------------------------------------------



First of all thank you for your effort by providing and constantly improving this patch.
I have two comments, both of them were already mentioned before, I just would like to give them more emphasis:
- please think a little bit on the `walkFileTree` optimization by implementing the `preVisitDirectory` (I'm not 100% sure that it's feasible, though) and also please move the `dirMatcher` and `fileMatcher` to the enclosing class.
- Following the symlinks should be added as well to keep the backward compatibility. (It's still an open issue from the initial code review rounds)

- Denes Arvay


On July 30, 2016, 7:33 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated July 30, 2016, 7:33 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> 
> Diff: https://reviews.apache.org/r/50378/diff/5/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/
-----------------------------------------------------------

(Updated July 12, 2018, 1:32 a.m.)


Review request for Flume.


Changes
-------

Fix one code style problem.


Repository: flume-git


Description
-------

In our log management project, we wan't to track many log files like this:
/app/dir1/log.*
/app/dir2/log.*
...
/app/dirn/log.*
But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
a1.sources.r1.filegroups.fg = /app/*/log.*


Diffs (updated)
-----

  flume-ng-doc/sphinx/FlumeUserGuide.rst a4e08186 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java 633d3c19 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f7201 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java a107a017 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java f2347f38 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirEventReader.java e75543c7 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c3410548 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0ba 


Diff: https://reviews.apache.org/r/50378/diff/8/

Changes: https://reviews.apache.org/r/50378/diff/7-8/


Testing
-------

All tests in TestTaildirSource passed.


Thanks,

qiao wen


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/
-----------------------------------------------------------

(Updated 六月 20, 2017, 2:13 p.m.)


Review request for Flume.


Changes
-------

Resolve the configuration compatibility between minor versions. And rebase the code to the latest trunk.


Repository: flume-git


Description
-------

In our log management project, we wan't to track many log files like this:
/app/dir1/log.*
/app/dir2/log.*
...
/app/dirn/log.*
But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
a1.sources.r1.filegroups.fg = /app/*/log.*


Diffs (updated)
-----

  flume-ng-doc/sphinx/FlumeUserGuide.rst fde56ecc 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java 633d3c19 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f7201 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java a107a017 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java f2347f38 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirEventReader.java e75543c7 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c3410548 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0ba 


Diff: https://reviews.apache.org/r/50378/diff/7/

Changes: https://reviews.apache.org/r/50378/diff/6-7/


Testing
-------

All tests in TestTaildirSource passed.


Thanks,

qiao wen


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/
-----------------------------------------------------------

(Updated \u4e09\u6708 13, 2017, 9 a.m.)


Review request for Flume.


Changes
-------

Unify regex. Add two config FILE_GROUPS_SUFFIX_DIR and FILE_GROUPS_SUFFIX_FILE.


Repository: flume-git


Description
-------

In our log management project, we wan't to track many log files like this:
/app/dir1/log.*
/app/dir2/log.*
...
/app/dirn/log.*
But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
a1.sources.r1.filegroups.fg = /app/*/log.*


Diffs (updated)
-----

  flume-ng-doc/sphinx/FlumeUserGuide.rst d863068 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java 8838320 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java a107a01 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java f2347f3 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirEventReader.java bcfe4bb 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 


Diff: https://reviews.apache.org/r/50378/diff/6/

Changes: https://reviews.apache.org/r/50378/diff/5-6/


Testing
-------

All tests in TestTaildirSource passed.


Thanks,

qiao wen


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.

> On \u4e8c\u6708 28, 2017, 1:42 p.m., Bal�zs Don�t Bessenyei wrote:
> > flume-ng-doc/sphinx/FlumeUserGuide.rst, line 1124
> > <https://reviews.apache.org/r/50378/diff/5/?file=1458425#file1458425line1124>
> >
> >     Can you please reformat this so that the link would be correct?

How? The link is correct now.


- qiao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review167069
-----------------------------------------------------------


On \u4e03\u6708 30, 2016, 5:33 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated \u4e03\u6708 30, 2016, 5:33 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.

> On \u4e8c\u6708 28, 2017, 1:42 p.m., Bal�zs Don�t Bessenyei wrote:
> > flume-ng-doc/sphinx/FlumeUserGuide.rst
> > Lines 1124 (patched)
> > <https://reviews.apache.org/r/50378/diff/5/?file=1458425#file1458425line1124>
> >
> >     Can you please reformat this so that the link would be correct?
> 
> qiao wen wrote:
>     How? The link is correct now.
> 
> Bal�zs Don�t Bessenyei wrote:
>     I just meant some reformatting like "Wildcards are allowed in directory name. (See `getPathMatcher(String syntaxAndPattern) <https://docs.oracle.com/javase/7/docs/api/java/nio/file/FileSystem.html#getPathMatcher(java.lang.String)>`_ for patterns)"

OK, I get it. Thanks.


- qiao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review167069
-----------------------------------------------------------


On \u4e03\u6708 30, 2016, 5:33 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated \u4e03\u6708 30, 2016, 5:33 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> 
> Diff: https://reviews.apache.org/r/50378/diff/5/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by Balázs Donát Bessenyei <be...@apache.org>.

> On Feb. 28, 2017, 1:42 p.m., Bal�zs Don�t Bessenyei wrote:
> > flume-ng-doc/sphinx/FlumeUserGuide.rst
> > Lines 1124 (patched)
> > <https://reviews.apache.org/r/50378/diff/5/?file=1458425#file1458425line1124>
> >
> >     Can you please reformat this so that the link would be correct?
> 
> qiao wen wrote:
>     How? The link is correct now.

I just meant some reformatting like "Wildcards are allowed in directory name. (See `getPathMatcher(String syntaxAndPattern) <https://docs.oracle.com/javase/7/docs/api/java/nio/file/FileSystem.html#getPathMatcher(java.lang.String)>`_ for patterns)"


- Bal�zs Don�t


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review167069
-----------------------------------------------------------


On July 30, 2016, 5:33 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated July 30, 2016, 5:33 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> 
> Diff: https://reviews.apache.org/r/50378/diff/5/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.

> On \u4e8c\u6708 28, 2017, 1:42 p.m., Bal�zs Don�t Bessenyei wrote:
> > flume-ng-doc/sphinx/FlumeUserGuide.rst
> > Line 1153 (original), 1155 (patched)
> > <https://reviews.apache.org/r/50378/diff/5/?file=1458425#file1458425line1155>
> >
> >     Is there a way to unify regex vs. glob patterns in the path?
> >     Ie. either glob or regex all the way?

In FLUME-2960_6.patch, unify in regex. Please review the latest code.


- qiao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review167069
-----------------------------------------------------------


On \u4e09\u6708 13, 2017, 9 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated \u4e09\u6708 13, 2017, 9 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst d863068 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java 8838320 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java a107a01 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java f2347f3 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirEventReader.java bcfe4bb 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> 
> Diff: https://reviews.apache.org/r/50378/diff/6/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by Balázs Donát Bessenyei <be...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review167069
-----------------------------------------------------------




flume-ng-doc/sphinx/FlumeUserGuide.rst (line 1124)
<https://reviews.apache.org/r/50378/#comment239196>

    Can you please reformat this so that the link would be correct?



flume-ng-doc/sphinx/FlumeUserGuide.rst (line 1155)
<https://reviews.apache.org/r/50378/#comment239198>

    Is there a way to unify regex vs. glob patterns in the path?
    Ie. either glob or regex all the way?


Other than the things I've mentioned, the patch looks fine to me.

- Bal�zs Don�t Bessenyei


On July 30, 2016, 5:33 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated July 30, 2016, 5:33 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/
-----------------------------------------------------------

(Updated \u4e03\u6708 30, 2016, 5:33 a.m.)


Review request for Flume.


Changes
-------

Modify description related FileVistor in TaildirMatcher.


Repository: flume-git


Description
-------

In our log management project, we wan't to track many log files like this:
/app/dir1/log.*
/app/dir2/log.*
...
/app/dirn/log.*
But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
a1.sources.r1.filegroups.fg = /app/*/log.*


Diffs (updated)
-----

  flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 

Diff: https://reviews.apache.org/r/50378/diff/


Testing
-------

All tests in TestTaildirSource passed.


Thanks,

qiao wen


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/
-----------------------------------------------------------

(Updated \u4e03\u6708 29, 2016, 12:04 p.m.)


Review request for Flume.


Changes
-------

Update user guide. Modify TaildirMatcher.


Repository: flume-git


Description
-------

In our log management project, we wan't to track many log files like this:
/app/dir1/log.*
/app/dir2/log.*
...
/app/dirn/log.*
But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
a1.sources.r1.filegroups.fg = /app/*/log.*


Diffs (updated)
-----

  flume-ng-doc/sphinx/FlumeUserGuide.rst 3f08d8b 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 

Diff: https://reviews.apache.org/r/50378/diff/


Testing
-------

All tests in TestTaildirSource passed.


Thanks,

qiao wen


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/
-----------------------------------------------------------

(Updated \u4e03\u6708 28, 2016, 3:18 p.m.)


Review request for Flume.


Changes
-------

Modify the code according to the review. Run checkstyle. Add description and java doc. Add some test.


Repository: flume-git


Description
-------

In our log management project, we wan't to track many log files like this:
/app/dir1/log.*
/app/dir2/log.*
...
/app/dirn/log.*
But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
a1.sources.r1.filegroups.fg = /app/*/log.*


Diffs (updated)
-----

  flume-ng-doc/sphinx/FlumeUserGuide.rst 1334500 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java c341054 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 

Diff: https://reviews.apache.org/r/50378/diff/


Testing
-------

All tests in TestTaildirSource passed.


Thanks,

qiao wen


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by Attila Simon <sa...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review143531
-----------------------------------------------------------




flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (lines 123 - 127)
<https://reviews.apache.org/r/50378/#comment209346>

    Please add a logging statement that you overrode a configurable parameter. Also please use a utility function with a self documenting name or add some comment what is the idea behind this check. (Probably both would be the best)



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java (line 241)
<https://reviews.apache.org/r/50378/#comment209349>

    Sorry I didn't meant you to add this test because it is suboptimal (using channel and transactions just to check matching files). I just altered your existing test because I thought it will help getting to the point. 
    
    Please check out TestTaildirMatcher class instead for copy paste ready examples.


- Attila Simon


On July 25, 2016, 3:40 p.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated July 25, 2016, 3:40 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 1334500 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/
-----------------------------------------------------------

(Updated \u4e03\u6708 25, 2016, 3:40 p.m.)


Review request for Flume.


Changes
-------

Make cache work with wildcards.


Repository: flume-git


Description
-------

In our log management project, we wan't to track many log files like this:
/app/dir1/log.*
/app/dir2/log.*
...
/app/dirn/log.*
But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
a1.sources.r1.filegroups.fg = /app/*/log.*


Diffs (updated)
-----

  flume-ng-doc/sphinx/FlumeUserGuide.rst 1334500 
  flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
  flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 

Diff: https://reviews.apache.org/r/50378/diff/


Testing
-------

All tests in TestTaildirSource passed.


Thanks,

qiao wen


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by qiao wen <31...@qq.com>.

> On \u4e03\u6708 25, 2016, 1:45 p.m., Attila Simon wrote:
> > Hi,
> > 
> > Your change interfears with a caching mechanism. I believe the best way to describe this interference is this test below. The problem in a nutshell is when a wildcard is specified then the parent directory used by file matching is not the immediate parent but the directory high above. Thus its last modification time won't be updated when a new file was added. Only mtime of dir1 is updated when file2.txt is added but caching is initialized by fg1 directory so it only monitors that for changes. I think either a documentation should state that when using wildcards then caching won't work or even better an assertation should check this combination at startup. (Making cache work with wildcard would be the best).
> > 
> > 
> >   @Test
> >   public void testWildcardsDirFilteringCache() throws IOException, InterruptedException {
> >     //first iteration everything is working as expected
> >     File f1 = new File(tmpDir.getAbsolutePath() + "/fg1/dir1/file1.txt");
> >     Files.createParentDirs(f1);
> >     Files.write("file1\n", f1, Charsets.UTF_8);
> > 
> >     Context context = new Context();
> >     context.put(POSITION_FILE, posFilePath);
> >     context.put(FILE_GROUPS, "fg1");
> >     context.put(FILE_GROUPS_PREFIX + "fg1", tmpDir.getAbsolutePath() + "/fg1/*/file.*");
> > 
> >     Configurables.configure(source, context);
> >     source.start();
> >     source.process();
> >     Transaction txn = channel.getTransaction();
> >     txn.begin();
> >     List<String> out = Lists.newArrayList();
> >     for (int i = 0; i < 2; i++) {
> >       Event e = channel.take();
> >       if (e != null) {
> >         out.add(TestTaildirEventReader.bodyAsString(e));
> >       }
> >     }
> >     txn.commit();
> >     txn.close();
> > 
> >     // empty iterations simulating that time is passing by
> >     Thread.sleep(1000);
> >     source.process();
> >     Thread.sleep(1000);
> >     
> >     //file was created after a while it should be picked up as well
> >     File f2 = new File(tmpDir.getAbsolutePath() + "/fg1/dir1/file2.txt");
> >     Files.write("file2\n", f2, Charsets.UTF_8);
> > 
> >     source.process();
> >     txn = channel.getTransaction();
> >     txn.begin();
> >     for (int i = 0; i < 2; i++) {
> >       Event e = channel.take();
> >       if (e != null) {
> >         out.add(TestTaildirEventReader.bodyAsString(e));
> >       }
> >     }
> >     txn.commit();
> >     txn.close();
> >     
> >     assertEquals(2, out.size()); //fails as file2.txt won't appear in the channel ever
> >     assertTrue(out.contains("file1"));
> >     assertTrue(out.contains("file2"));
> >   }

Thanks for your review. I will modify the code asap.


- qiao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review143374
-----------------------------------------------------------


On \u4e03\u6708 24, 2016, 10:37 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated \u4e03\u6708 24, 2016, 10:37 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 1334500 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>


Re: Review Request 50378: FLUME-2960: Support Wildcards in directoryname in TaildirSource

Posted by Attila Simon <sa...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50378/#review143374
-----------------------------------------------------------



Hi,

Your change interfears with a caching mechanism. I believe the best way to describe this interference is this test below. The problem in a nutshell is when a wildcard is specified then the parent directory used by file matching is not the immediate parent but the directory high above. Thus its last modification time won't be updated when a new file was added. Only mtime of dir1 is updated when file2.txt is added but caching is initialized by fg1 directory so it only monitors that for changes. I think either a documentation should state that when using wildcards then caching won't work or even better an assertation should check this combination at startup. (Making cache work with wildcard would be the best).


  @Test
  public void testWildcardsDirFilteringCache() throws IOException, InterruptedException {
    //first iteration everything is working as expected
    File f1 = new File(tmpDir.getAbsolutePath() + "/fg1/dir1/file1.txt");
    Files.createParentDirs(f1);
    Files.write("file1\n", f1, Charsets.UTF_8);

    Context context = new Context();
    context.put(POSITION_FILE, posFilePath);
    context.put(FILE_GROUPS, "fg1");
    context.put(FILE_GROUPS_PREFIX + "fg1", tmpDir.getAbsolutePath() + "/fg1/*/file.*");

    Configurables.configure(source, context);
    source.start();
    source.process();
    Transaction txn = channel.getTransaction();
    txn.begin();
    List<String> out = Lists.newArrayList();
    for (int i = 0; i < 2; i++) {
      Event e = channel.take();
      if (e != null) {
        out.add(TestTaildirEventReader.bodyAsString(e));
      }
    }
    txn.commit();
    txn.close();

    // empty iterations simulating that time is passing by
    Thread.sleep(1000);
    source.process();
    Thread.sleep(1000);
    
    //file was created after a while it should be picked up as well
    File f2 = new File(tmpDir.getAbsolutePath() + "/fg1/dir1/file2.txt");
    Files.write("file2\n", f2, Charsets.UTF_8);

    source.process();
    txn = channel.getTransaction();
    txn.begin();
    for (int i = 0; i < 2; i++) {
      Event e = channel.take();
      if (e != null) {
        out.add(TestTaildirEventReader.bodyAsString(e));
      }
    }
    txn.commit();
    txn.close();
    
    assertEquals(2, out.size()); //fails as file2.txt won't appear in the channel ever
    assertTrue(out.contains("file1"));
    assertTrue(out.contains("file2"));
  }

- Attila Simon


On July 24, 2016, 10:37 a.m., qiao wen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50378/
> -----------------------------------------------------------
> 
> (Updated July 24, 2016, 10:37 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> In our log management project, we wan't to track many log files like this:
> /app/dir1/log.*
> /app/dir2/log.*
> ...
> /app/dirn/log.*
> But TaildirSource can't support wildcards in filegroup directory name. The following config is expected:
> a1.sources.r1.filegroups.fg = /app/*/log.*
> 
> 
> Diffs
> -----
> 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 1334500 
>   flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java ad9f720 
>   flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java 097ee0b 
> 
> Diff: https://reviews.apache.org/r/50378/diff/
> 
> 
> Testing
> -------
> 
> All tests in TestTaildirSource passed.
> 
> 
> Thanks,
> 
> qiao wen
> 
>