You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by bdesert <gi...@git.apache.org> on 2018/12/04 12:33:48 UTC

[GitHub] nifi pull request #3183: NIFI-5826 Fix to escaped backslash

Github user bdesert commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3183#discussion_r238642945
  
    --- Diff: nifi-commons/nifi-record-path/src/main/java/org/apache/nifi/record/path/util/RecordPathUtils.java ---
    @@ -39,4 +39,52 @@ public static String getFirstStringValue(final RecordPathSegment segment, final
     
             return stringValue;
         }
    +
    +    /**
    +     * This method handles backslash sequences after ANTLR parser converts all backslash into double ones
    +     * with exception for \t, \r and \n. See
    +     * <a href="file:../../../../../../../../../src/main/antlr3/org/apache/nifi/record/path/RecordPathParser.g">org/apache/nifi/record/path/RecordPathParser.g</a>
    --- End diff --
    
    @ijokarumawak ,
    I understand what you are talking about, that was my first idea as well.
    the problem is coming with regression on WORKING functionality, like EL.
    If we change Lexer in all three cases, we will have a problem of backward compatibility.
    As an example: define GFF with attribute "a1" having value (with actual new line): 
    ```
    "this is new line
    and this is just a backslash \n"
    ```
    (doesn't matter with quotes  or w/o).
    next: UpdateAttribute with:
    `a1: "${a:replaceAll('\\n','_')}"`
    and
    `a2: "${a:replaceAll('\n','_')}"`
    (note single and double backslash)
    In both cases the only character that will be replaced will be actual new line, resulting in:
    `a1=a2="this is new line_and this is just a backslash \n"
    `
    If we change Lexer for EL, this will be changed and will behave differently and will result in:
    `a1 = "his is new line`
    and this is just a backslash _"
    `a2 = "his is new line_and this is just a backslash \n"`
    
    Of course, we can do "right" way for RecordPath regex-based functions, but then we will have different user experience with regex in RecordPath and EL, which I think should be avoided.
    
    ----
    Regarding Java method "unescapeBackslash". As a name suggests, this function it to treat string values having "backslash" in their values. I do agree that for the test cases we have, some parts of the code are dead, but since this is public method in utility class, it can have generic functionality to support wider varieties of the use cases.
    
    Would appreciate your feedback!


---