You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by mgaido91 <gi...@git.apache.org> on 2017/12/15 00:19:42 UTC

[GitHub] nifi pull request #2343: NIFI-2169: Cache compiled regexp for RouteText

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/nifi/pull/2343

    NIFI-2169: Cache compiled regexp for RouteText

    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [x] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [x] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [x] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [x] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? N/A
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? N/A
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? N/A
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? N/A
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? N/A
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/nifi NIFI-2169

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/2343.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2343
    
----
commit f6b9aab6775f62863071321618bfa151e3b34c97
Author: Marco Gaido <mg...@hortonworks.com>
Date:   2017-12-14T23:44:32Z

    NIFI-2169: Cache compiled regexp for RouteText

----


---

[GitHub] nifi pull request #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2343#discussion_r157230846
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/RouteText.java ---
    @@ -209,6 +215,30 @@
         private volatile Map<Relationship, PropertyValue> propertyMap = new HashMap<>();
         private volatile Pattern groupingRegex = null;
     
    +    @VisibleForTesting
    +    final static int PATTERNS_CACHE_MAXIMUM_ENTRIES = 10;
    --- End diff --
    
    We could probably cache more than 10 here. I think the idea on the PR was simply to convey that we need a reasonable upward bound, rather than allowing it to grow indefinitely. I would tend to lean more toward say 100 personally? Or even 1024 or so. A compiled Pattern is fairly small I believe in terms of heap utilization, so I wouldn't be concerned personally with such a limit.


---

[GitHub] nifi issue #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by joewitt <gi...@git.apache.org>.
Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/2343
  
    @mgaido91 it failed due to a checkstyle issue.  You'll want to look at the raw logs and scroll to the bottom.  The key issue is
    
    [WARNING] src/test/java/org/apache/nifi/processors/standard/TestRouteText.java:[34,8] (imports) UnusedImports: Unused import - org.apache.nifi.util.MockProcessSession.


---

[GitHub] nifi issue #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/nifi/pull/2343
  
    @markap14 I saw that you made most of the commits to it. May you help me reviewing this? Thanks.
    
    PS travis CI is failing because it reaches the 4MB log length limit. I am not sure why this is happening. Can anyone help me? thanks


---

[GitHub] nifi pull request #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2343#discussion_r157241548
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/RouteText.java ---
    @@ -209,6 +215,30 @@
         private volatile Map<Relationship, PropertyValue> propertyMap = new HashMap<>();
         private volatile Pattern groupingRegex = null;
     
    +    @VisibleForTesting
    +    final static int PATTERNS_CACHE_MAXIMUM_ENTRIES = 10;
    +
    +    /**
    +     * LRU cache for the compiled patterns. The size of the cache is determined by the value of
    +     * {@link #PATTERNS_CACHE_MAXIMUM_ENTRIES}.
    +     */
    +    @VisibleForTesting
    +    final ConcurrentMap<Pair<Boolean, String>, Pattern> patternsCache = CacheBuilder.newBuilder()
    --- End diff --
    
    yes, this was my other option. I will do this, thanks.


---

[GitHub] nifi issue #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/nifi/pull/2343
  
    @markap14 do you have further comments on this? Thanks


---

[GitHub] nifi issue #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/2343
  
    @mgaido91 Travis-CI also reports the following checkstyle violation:
    `[WARNING] src/test/java/org/apache/nifi/processors/standard/TestRouteText.java:[34,8] (imports) UnusedImports: Unused import - org.apache.nifi.util.MockProcessSession.
    `



---

[GitHub] nifi issue #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/2343
  
    @mgaido91 sorry I didn't see Joe's comments until I made that last one :)


---

[GitHub] nifi pull request #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2343#discussion_r157241247
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/RouteText.java ---
    @@ -209,6 +215,30 @@
         private volatile Map<Relationship, PropertyValue> propertyMap = new HashMap<>();
         private volatile Pattern groupingRegex = null;
     
    +    @VisibleForTesting
    +    final static int PATTERNS_CACHE_MAXIMUM_ENTRIES = 10;
    --- End diff --
    
    yes, I was thinking about introducing a configuration property for it but it seemed an overkill to me. I agree that it is not a big deal having more. I will set it to 1024, thanks.


---

[GitHub] nifi pull request #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2343#discussion_r157232403
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/RouteText.java ---
    @@ -209,6 +215,30 @@
         private volatile Map<Relationship, PropertyValue> propertyMap = new HashMap<>();
         private volatile Pattern groupingRegex = null;
     
    +    @VisibleForTesting
    +    final static int PATTERNS_CACHE_MAXIMUM_ENTRIES = 10;
    +
    +    /**
    +     * LRU cache for the compiled patterns. The size of the cache is determined by the value of
    +     * {@link #PATTERNS_CACHE_MAXIMUM_ENTRIES}.
    +     */
    +    @VisibleForTesting
    +    final ConcurrentMap<Pair<Boolean, String>, Pattern> patternsCache = CacheBuilder.newBuilder()
    +            .maximumSize(PATTERNS_CACHE_MAXIMUM_ENTRIES)
    +            .<Pair<Boolean, String>, Pattern>build()
    +            .asMap();
    +
    +    private final Function<Pair<Boolean, String>, Pattern> compileRegex = ignoreCaseAndRegex -> {
    --- End diff --
    
    Again, I would avoid the use of the Pair<Boolean, String> here... and really would probably avoid the Function all together. Since it seems to be referenced only once, I'd prefer to instead just inline the use in the cacheCompiledPattern method, so that there we could just call something like:
    
    `return patternsCache.computeIfAbsent(key, toCompile -> ignoreCase ? Pattern.compile(toCompile, Pattern.CASE_INSENSITIVE) : Pattern.compile(toCompile));`


---

[GitHub] nifi issue #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/nifi/pull/2343
  
    I see, thanks @joewitt. I made some errors in finding the right log in travis. I still have to familiarize with it. Thank you very much, I am fixing it.


---

[GitHub] nifi pull request #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2343#discussion_r157231329
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/RouteText.java ---
    @@ -209,6 +215,30 @@
         private volatile Map<Relationship, PropertyValue> propertyMap = new HashMap<>();
         private volatile Pattern groupingRegex = null;
     
    +    @VisibleForTesting
    +    final static int PATTERNS_CACHE_MAXIMUM_ENTRIES = 10;
    +
    +    /**
    +     * LRU cache for the compiled patterns. The size of the cache is determined by the value of
    +     * {@link #PATTERNS_CACHE_MAXIMUM_ENTRIES}.
    +     */
    +    @VisibleForTesting
    +    final ConcurrentMap<Pair<Boolean, String>, Pattern> patternsCache = CacheBuilder.newBuilder()
    --- End diff --
    
    I think we can simplify this some. Because the notion of 'ignore case' is going to be true for all regexes or false for all regexes, I think we can get rid of the Pair<Boolean, String> and just use String as the key. Then we'd just need to ensure that we clear the cache in the @OnScheduled method or in the onPropertyModified if that property is changed.


---

[GitHub] nifi pull request #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/2343


---

[GitHub] nifi pull request #2343: NIFI-2169: Cache compiled regexp for RouteText

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2343#discussion_r157242761
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/RouteText.java ---
    @@ -209,6 +215,30 @@
         private volatile Map<Relationship, PropertyValue> propertyMap = new HashMap<>();
         private volatile Pattern groupingRegex = null;
     
    +    @VisibleForTesting
    +    final static int PATTERNS_CACHE_MAXIMUM_ENTRIES = 10;
    +
    +    /**
    +     * LRU cache for the compiled patterns. The size of the cache is determined by the value of
    +     * {@link #PATTERNS_CACHE_MAXIMUM_ENTRIES}.
    +     */
    +    @VisibleForTesting
    +    final ConcurrentMap<Pair<Boolean, String>, Pattern> patternsCache = CacheBuilder.newBuilder()
    +            .maximumSize(PATTERNS_CACHE_MAXIMUM_ENTRIES)
    +            .<Pair<Boolean, String>, Pattern>build()
    +            .asMap();
    +
    +    private final Function<Pair<Boolean, String>, Pattern> compileRegex = ignoreCaseAndRegex -> {
    --- End diff --
    
    for the `Pair`, I'll get rid of it following your suggestion above.
    
    For the `Function`, my goal in defining the function as a parameter was to avoid the creation of a new `Function` object at every invocation, using always the same. What do you think?


---