You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by ijokarumawak <gi...@git.apache.org> on 2018/12/05 06:21:39 UTC

[GitHub] nifi pull request #3200: NIFI-5826 WIP Fix back-slash escaping at Lexers

GitHub user ijokarumawak opened a pull request:

    https://github.com/apache/nifi/pull/3200

    NIFI-5826 WIP Fix back-slash escaping at Lexers

    ## Summary
    Current Lexers convert a back-slash plus another character sequence (e.g. '\[') to double back-slash plus the next character (e.g. '\\[').
    But from detailed analysis (see below), it seems the conversion is wrong and it should leave such characters as it is. 
    
    ## Details
    I debugged how Lexer works, and found that:
    
    - The `ESC` fragment handles an escaped special character in String representation. I.e. String `\t` will be converted to actual tab character.
    - The string values user input from NiFi UI are passed to `RecordPath.compile` method as it is. E.g. the input string `replaceRegex(/name, '\[', '')` is passed to as is, then the single back-slash is converted to double back-slash by the ESC fragment line 155.
    - I believe the line 153-156 is aimed to preserve escaped characters as it is, because such escape doesn't mean anything for the RecordPath/AttrExpLang spec. And those should be unescaped later by underlying syntaxes such as RegEx.
        - And current line 155 does it wrongly. It should append a single back-slash..
        - Other Lexers (AttributeExpressionLexer.g and HL7QueryLexer.g) have the same issue.
    - So, I think we should fix all Lexers instead of adding another conversion.
    
    Here is the [Lexer code](https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record-path/src/main/antlr3/org/apache/nifi/record/path/RecordPathLexer.g#L143) for reference:
    ```
    143 fragment
    144 ESC
    145   :  '\\'
    146     (
    147         '"'    { setText("\""); }
    148       |  '\''  { setText("\'"); }
    149       |  'r'   { setText("\r"); }
    150       |  'n'   { setText("\n"); }
    151       |  't'   { setText("\t"); }
    152       |  '\\'  { setText("\\\\"); }
    153       |  nextChar = ~('"' | '\'' | 'r' | 'n' | 't' | '\\')
    154        {
    155          StringBuilder lBuf = new StringBuilder(); lBuf.append("\\\\").appendCodePoint(nextChar); setText(lBuf.toString());
    156        }
    157    )
    158  ;
    ```
    
    ## NiFi template for test
    
    Here is a NiFi flow template to test how before/after this change.
    https://gist.github.com/ijokarumawak/b6bdca8074a4457bc4a425b90a6b17f0
    
    In order to try the template, you need to build this PR as NiFi 1.9.0-SNAPSHOT, then download following 1.8.0 nars in your SNAPSHOT's lib dir, so that both versions can be used in the flow.
    
    - https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-standard-nar/1.8.0/nifi-standard-nar-1.8.0.nar
    - https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-update-attribute-nar/1.8.0/nifi-update-attribute-nar-1.8.0.nar
    
    ## Test result
    
    ### UpdateAttribute test for backward compatibility
    
    ![image](https://user-images.githubusercontent.com/1107620/49493740-93078700-f8a0-11e8-9360-025254b39551.png)
    
    GenerateFlowFile generates FlowFiles with attribute `a` whose value is:
    ```
    this is new line
    and this is just a backslash \n
    ```
    
    ![image](https://user-images.githubusercontent.com/1107620/49493751-9c90ef00-f8a0-11e8-8d41-6005f01157a7.png)
    
    Result
    1.8.0
    ![image](https://user-images.githubusercontent.com/1107620/49493779-b5010980-f8a0-11e8-9911-22c0d71e865b.png)
    
    1.9.0-SNAPSHOT
    ![image](https://user-images.githubusercontent.com/1107620/49493786-baf6ea80-f8a0-11e8-8c04-7efb54167345.png)
    
    ### UpdateRecord test illustrating the NIFI-5826 issue is addressed
    ![image](https://user-images.githubusercontent.com/1107620/49493825-e083f400-f8a0-11e8-8b4a-cf17e370282e.png)
    
    GenerateFlowFile generates content:
    ```
    key,value
    on[e,1
    [two,2
    ```
    
    ![image](https://user-images.githubusercontent.com/1107620/49493836-e8439880-f8a0-11e8-8e89-f07db2712690.png)
    
    Result
    1.8.0
    Regex compilation error as reported
    ![image](https://user-images.githubusercontent.com/1107620/49493844-f09bd380-f8a0-11e8-822f-f747f3184fc5.png)
    
    1.9.0-SNAPSHOT
    The square brackets are converted successfully
    ![image](https://user-images.githubusercontent.com/1107620/49493863-03160d00-f8a1-11e8-8f11-ac95670412ed.png)
    
    
    ---
    
    
    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [ ] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [ ] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ijokarumawak/nifi NIFI-5826

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/3200.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3200
    
----
commit ccb6f9265d99850debfe56f5bb0849ae9814a6d4
Author: Koji Kawamura <ij...@...>
Date:   2018-12-05T06:03:21Z

    NIFI-5826 Fix back-slash escaping at Lexers

----


---

[GitHub] nifi issue #3200: NIFI-5826 Fix back-slash escaping at Lexers

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on the issue:

    https://github.com/apache/nifi/pull/3200
  
    @bdesert @ottobackwards Thanks for reviewing. I've added unit tests.
    
    As a reference, I run the added tests with Lexer before this PR. Following tests failed with current Lexer, but passes with the updated Lexer:
    
    This test failed with original Lexer because `[\s]` is converted to `[\\s]` by Lexer and didn't match.
    ```
    [ERROR] testReplaceRegexEscapedCharacters(org.apache.nifi.record.path.TestRecordPath)  Time elapsed: 0.007 s  <<< FAILURE!
    org.junit.ComparisonFailure:
    Replacing whitespace to new line expected:<John[
    ]Doe> but was:<John[ ]Doe>
            at org.apache.nifi.record.path.TestRecordPath.testReplaceRegexEscapedCharacters(TestRecordPath.java:1046)
    ```
    
    This test failed with original Lexer because `\[` is converted to `\\[` by Lexer and produced RegEx syntax error.
    ```
    [ERROR] testReplaceRegexEscapedBrackets(org.apache.nifi.record.path.TestRecordPath)  Time elapsed: 0.001 s  <<< ERROR!
    org.apache.nifi.record.path.exception.RecordPathException:
    java.util.regex.PatternSyntaxException: Unclosed character class near index 2
    \\[
      ^
            at org.apache.nifi.record.path.TestRecordPath.testReplaceRegexEscapedBrackets(TestRecordPath.java:1149)
    ```
    
    Both tests failed with the same cause, and fixed by this PR.


---

[GitHub] nifi issue #3200: NIFI-5826 WIP Fix back-slash escaping at Lexers

Posted by bdesert <gi...@git.apache.org>.
Github user bdesert commented on the issue:

    https://github.com/apache/nifi/pull/3200
  
    Reviewing...


---

[GitHub] nifi issue #3200: NIFI-5826 WIP Fix back-slash escaping at Lexers

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/nifi/pull/3200
  
    There should be new tests that go along with this


---