You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by mosermw <gi...@git.apache.org> on 2018/06/06 19:07:53 UTC

[GitHub] nifi pull request #2767: NIFI-5274 avoid rollback on uncaught errors in Repl...

GitHub user mosermw opened a pull request:

    https://github.com/apache/nifi/pull/2767

    NIFI-5274 avoid rollback on uncaught errors in ReplaceText

    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [x] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [x] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [x] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [x] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mosermw/nifi nifi-5274

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/2767.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2767
    
----
commit f874890aa27463bb0c6a755062848990ad81931f
Author: Mike Moser <mo...@...>
Date:   2018-06-06T18:25:32Z

    NIFI-5274 avoid rollback on uncaught errors in ReplaceText

----


---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by mosermw <gi...@git.apache.org>.
Github user mosermw commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    @mattyb149  I think the issue is whether there is a reasonable expectation that a user would loop back a 'failure' relationship, and precedent set in other similar processors.  For example, I consider ReplaceText in the same category as processors that modify content such as CompressContent, UnpackContent, and EncryptContent.  None of those processors penalize flowfiles sent to failure.  In those processors it's not reasonable to expect a failure to correct itself, so it's not reasonable to loop back the failure relationship.  I just followed that precedent when modifying ReplaceText for this PR.


---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by joewitt <gi...@git.apache.org>.
Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    With the changes made it isn't really a recommended way to do it.  What the change does is look for specific runtime exceptions that the framework could throw and which the user code of this processor couldn't really do anything meaningful and causes the normal behavior that will happen if these exceptions leak out which is yield/rollback.  But that was done so that 'other' exceptions besides those could then be handled in a way specific to what is desired here and that is to treat the flowfile as a failure and route it appropriately.  Typically, you'd not want to be treating general exceptions as the reason for 'failure' routing and instead you'd be looking for specific conditions.  So the change here I think is just dealing with the reality of the limited options this processor must have (i didnt' look into the details but that is what makes sense to me).


---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by mosermw <gi...@git.apache.org>.
Github user mosermw commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    @ottobackwards  This post is old but I think still applies.  In general, most regular expression engines use recursion for some things.  https://stackoverflow.com/questions/7509905/java-lang-stackoverflowerror-while-using-a-regex-to-parse-big-strings


---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by joewitt <gi...@git.apache.org>.
Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    @mosermw nah its ok - you're probably right


---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    Can you point out the recursive code you are referencing?


---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    I'm having trouble following the above conversation: why doesn't the penalize apply here? I can't remember the default prioritizer but I'd think in general we wouldn't want to have a large file clog the input stream if the "failure" relationship was routed back to the processor?


---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by joewitt <gi...@git.apache.org>.
Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    My response was to the general pattern of how to handle exceptions/failure routing.
    
    Category1: For things you can anticipate and detect in the flowfile that make processing not viable - we should ideally detect these and route to failure as they'll never process nicely so yield/rollback isnt viable.
    
    Category2:For things you can anticipate in flowfiles that will cause processing problems but detection isn't viable/etc.. then relying on a specific known set of exceptions is fair game.  Route these to failure as yield/rollback isn't viable and wont solve the problem.
    
    Category3: If you know bad stuff can happen but cannot detect them ahead of time and there isn't a clear finite set of things to detect then doing what you're doing to isolate them by instead putting the frameworky things in a finite set and calling all else as 'failure' is fair game.
    
    Your case appears to be Category3 right now so you're handling it fine for that.  Ideally though you could try to move to Category 1 and if not then fallback to Category2.
    
    Again I'm really responding more to Otto's question about the patterns/recommendations than what 'you did'.  I didn't look into the details of the options you had available to you and if indeed Category3 is as good as it gets then I think you did the right thing here.
    
    In this case my hope would be we could detect configurations or data patterns that would lead to StackOverflowError and avoid that altogether but i'm saying that without understanding the details of this case so please dont take that too seriously.
    
    Hopefully that makes more sense.



---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by mosermw <gi...@git.apache.org>.
Github user mosermw commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    I'm not sure I understand what your suggestion is @joewitt.  Would you clarify for me?  Should I code it to catch StackOverflowError, which is the use case that I need to handle cleanly?
    The code pattern I used here has worked really well for me in custom processors to avoid yield/rollback on exceptions caused by flowfile content, while maintaining a desired yield/rollback on framework exceptions in other situations.


---

[GitHub] nifi pull request #2767: NIFI-5274 avoid rollback on uncaught errors in Repl...

Posted by joewitt <gi...@git.apache.org>.
Github user joewitt commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2767#discussion_r193866550
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ReplaceText.java ---
    @@ -287,14 +287,24 @@ public void onTrigger(final ProcessContext context, final ProcessSession session
     
             if (evaluateMode.equalsIgnoreCase(ENTIRE_TEXT)) {
                 if (flowFile.getSize() > maxBufferSize && replacementStrategyExecutor.isAllDataBufferedForEntireText()) {
    -                session.transfer(flowFile, REL_FAILURE);
    +                session.transfer(session.penalize(flowFile), REL_FAILURE);
    --- End diff --
    
    @mosermw are you penalizing it here in case someone is looping failure?  If they loop failure relationship then the problem will happen over and over anyway and perhaps better we just doc that they should not do so.  Were you thinking a downstream consumer of this would prefer it to be penalized?  Just trying to understand the benefit of penalizing here.


---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    Thanks for the info @mosermw 


---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by mosermw <gi...@git.apache.org>.
Github user mosermw commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    One never knows how a dataflow manager will design their flow, so I was just trying to cover all bases.  I don't think a downstream consumer would notice that the failure files were penalized, but if someone did loop their failure relationship, they would certainly notice if a failed file wan't penalized.
    I can remove the call to penalize.


---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by mosermw <gi...@git.apache.org>.
Github user mosermw commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    The situation I'm trying to catch uses a regular expression that looks legitimate, so can't be caught while validating configuration, but then blows up when the input flowfile causes the Pattern matcher to produce too many matches while using recursive code.  I do think this falls into a Category 3 issue.  I simplified the code to catch StackOverflowError and comment why the catch exists.


---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    +1 LGTM, thanks to all for the reviews, and thanks Mike for the improvement! Merging to master


---

[GitHub] nifi pull request #2767: NIFI-5274 avoid rollback on uncaught errors in Repl...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/2767


---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    mvn package -P contrib-check passes,
    code looks good.
    
    The question that comes to my mind is if this is following some 'recommended' way of handling these exceptions, and should that be documented or referenced.  But I haven't seen any other processor document it's yield v. fail strategy.
    
    +1 fwiw



---

[GitHub] nifi issue #2767: NIFI-5274 avoid rollback on uncaught errors in ReplaceText

Posted by mosermw <gi...@git.apache.org>.
Github user mosermw commented on the issue:

    https://github.com/apache/nifi/pull/2767
  
    Yeah, but there's only 1 other processor in the codebase that does data transformation and does penalize on failure.  By far most processors that penalize on failure are ingress/egress processors, where it makes more sense.


---