You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by ijokarumawak <gi...@git.apache.org> on 2018/12/05 06:21:39 UTC
[GitHub] nifi pull request #3200: NIFI-5826 WIP Fix back-slash escaping at Lexers
GitHub user ijokarumawak opened a pull request:
https://github.com/apache/nifi/pull/3200
NIFI-5826 WIP Fix back-slash escaping at Lexers
## Summary
Current Lexers convert a back-slash plus another character sequence (e.g. '\[') to double back-slash plus the next character (e.g. '\\[').
But from detailed analysis (see below), it seems the conversion is wrong and it should leave such characters as it is.
## Details
I debugged how Lexer works, and found that:
- The `ESC` fragment handles an escaped special character in String representation. I.e. String `\t` will be converted to actual tab character.
- The string values user input from NiFi UI are passed to `RecordPath.compile` method as it is. E.g. the input string `replaceRegex(/name, '\[', '')` is passed to as is, then the single back-slash is converted to double back-slash by the ESC fragment line 155.
- I believe the line 153-156 is aimed to preserve escaped characters as it is, because such escape doesn't mean anything for the RecordPath/AttrExpLang spec. And those should be unescaped later by underlying syntaxes such as RegEx.
- And current line 155 does it wrongly. It should append a single back-slash..
- Other Lexers (AttributeExpressionLexer.g and HL7QueryLexer.g) have the same issue.
- So, I think we should fix all Lexers instead of adding another conversion.
Here is the [Lexer code](https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record-path/src/main/antlr3/org/apache/nifi/record/path/RecordPathLexer.g#L143) for reference:
```
143 fragment
144 ESC
145 : '\\'
146 (
147 '"' { setText("\""); }
148 | '\'' { setText("\'"); }
149 | 'r' { setText("\r"); }
150 | 'n' { setText("\n"); }
151 | 't' { setText("\t"); }
152 | '\\' { setText("\\\\"); }
153 | nextChar = ~('"' | '\'' | 'r' | 'n' | 't' | '\\')
154 {
155 StringBuilder lBuf = new StringBuilder(); lBuf.append("\\\\").appendCodePoint(nextChar); setText(lBuf.toString());
156 }
157 )
158 ;
```
## NiFi template for test
Here is a NiFi flow template to test how before/after this change.
https://gist.github.com/ijokarumawak/b6bdca8074a4457bc4a425b90a6b17f0
In order to try the template, you need to build this PR as NiFi 1.9.0-SNAPSHOT, then download following 1.8.0 nars in your SNAPSHOT's lib dir, so that both versions can be used in the flow.
- https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-standard-nar/1.8.0/nifi-standard-nar-1.8.0.nar
- https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-update-attribute-nar/1.8.0/nifi-update-attribute-nar-1.8.0.nar
## Test result
### UpdateAttribute test for backward compatibility
![image](https://user-images.githubusercontent.com/1107620/49493740-93078700-f8a0-11e8-9360-025254b39551.png)
GenerateFlowFile generates FlowFiles with attribute `a` whose value is:
```
this is new line
and this is just a backslash \n
```
![image](https://user-images.githubusercontent.com/1107620/49493751-9c90ef00-f8a0-11e8-8d41-6005f01157a7.png)
Result
1.8.0
![image](https://user-images.githubusercontent.com/1107620/49493779-b5010980-f8a0-11e8-9911-22c0d71e865b.png)
1.9.0-SNAPSHOT
![image](https://user-images.githubusercontent.com/1107620/49493786-baf6ea80-f8a0-11e8-8c04-7efb54167345.png)
### UpdateRecord test illustrating the NIFI-5826 issue is addressed
![image](https://user-images.githubusercontent.com/1107620/49493825-e083f400-f8a0-11e8-8b4a-cf17e370282e.png)
GenerateFlowFile generates content:
```
key,value
on[e,1
[two,2
```
![image](https://user-images.githubusercontent.com/1107620/49493836-e8439880-f8a0-11e8-8e89-f07db2712690.png)
Result
1.8.0
Regex compilation error as reported
![image](https://user-images.githubusercontent.com/1107620/49493844-f09bd380-f8a0-11e8-822f-f747f3184fc5.png)
1.9.0-SNAPSHOT
The square brackets are converted successfully
![image](https://user-images.githubusercontent.com/1107620/49493863-03160d00-f8a1-11e8-8f11-ac95670412ed.png)
---
Thank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
- [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
- [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
- [ ] Is your initial contribution a single, squashed commit?
### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
### Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ijokarumawak/nifi NIFI-5826
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nifi/pull/3200.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3200
----
commit ccb6f9265d99850debfe56f5bb0849ae9814a6d4
Author: Koji Kawamura <ij...@...>
Date: 2018-12-05T06:03:21Z
NIFI-5826 Fix back-slash escaping at Lexers
----
---
[GitHub] nifi issue #3200: NIFI-5826 Fix back-slash escaping at Lexers
Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on the issue:
https://github.com/apache/nifi/pull/3200
@bdesert @ottobackwards Thanks for reviewing. I've added unit tests.
As a reference, I run the added tests with Lexer before this PR. Following tests failed with current Lexer, but passes with the updated Lexer:
This test failed with original Lexer because `[\s]` is converted to `[\\s]` by Lexer and didn't match.
```
[ERROR] testReplaceRegexEscapedCharacters(org.apache.nifi.record.path.TestRecordPath) Time elapsed: 0.007 s <<< FAILURE!
org.junit.ComparisonFailure:
Replacing whitespace to new line expected:<John[
]Doe> but was:<John[ ]Doe>
at org.apache.nifi.record.path.TestRecordPath.testReplaceRegexEscapedCharacters(TestRecordPath.java:1046)
```
This test failed with original Lexer because `\[` is converted to `\\[` by Lexer and produced RegEx syntax error.
```
[ERROR] testReplaceRegexEscapedBrackets(org.apache.nifi.record.path.TestRecordPath) Time elapsed: 0.001 s <<< ERROR!
org.apache.nifi.record.path.exception.RecordPathException:
java.util.regex.PatternSyntaxException: Unclosed character class near index 2
\\[
^
at org.apache.nifi.record.path.TestRecordPath.testReplaceRegexEscapedBrackets(TestRecordPath.java:1149)
```
Both tests failed with the same cause, and fixed by this PR.
---
[GitHub] nifi issue #3200: NIFI-5826 WIP Fix back-slash escaping at Lexers
Posted by bdesert <gi...@git.apache.org>.
Github user bdesert commented on the issue:
https://github.com/apache/nifi/pull/3200
Reviewing...
---
[GitHub] nifi issue #3200: NIFI-5826 WIP Fix back-slash escaping at Lexers
Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:
https://github.com/apache/nifi/pull/3200
There should be new tests that go along with this
---