You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Andy LoPresto <al...@apache.org> on 2016/05/25 21:34:21 UTC

[discuss] Proposed addition of replaceRegex to expression language

Hi all,

During investigation of an expression language issue posted to the list, I discovered that replace explicitly delegates to a String#replace invocation that only accepts literal expressions, not regular expressions, while replaceAll accepts regular expressions. I thought this was an oversight and filed NIFI-1919 [1] to document and fix this, by changing the ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular expressions. I wrote failing unit tests [3] to capture the fix. After implementing the change, two existing unit tests [4] broke, which indicated a regression. At first, I believed these two tests to be incorrect, but further investigation showed they were merely surprising.

TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call, but the test is asserting that replace should replace multiple instances of the single quote. While this is similar to String#replace, because the expression language exposes only two methods — replace vs. replaceAll — one could easily assume the difference between the two was the number of attempted replacements, rather than the actual difference, which is literal expression vs. pattern.

@Test
public void testQuotingQuotes() {
    final Map<String, String> attributes = new HashMap<>();
    attributes.put("xx", "say 'hi'");

    String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}";
    verifyEquals(query, attributes, "say \"hello\"");

    query = "${xx:replace( \"'\", '\"')}";
    verifyEquals(query, attributes, "say \"hi\"");

    query = "${xx:replace( '\\'', '\"')}";
    System.out.println(query);
    verifyEquals(query, attributes, "say \"hi\"");
}
TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails on the first verifyEquals call with a PatternSyntaxException. I am investigating that further.

@Test
public void testReplaceAllWithOddNumberOfBackslashPairs() {
    final Map<String, String> attributes = new HashMap<>();
    attributes.put("filename", "C:\\temp\\.txt");

    verifyEquals("${filename:replace('\\\\', '/')}", attributes, "C:/temp/.txt");
    verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes, "C:/temp/.txt");
    verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes, "C:\\temp");
}
While I originally had just modified replace, after looking at the EL documentation [5], replace is explicitly documented to only replace literal expressions, and does so multiple times, as does Java’s String#replace [6]. I now propose to add another method replaceFirst, which accepts a pattern and replaces only the first occurrence. I will update the unit tests to properly capture this, and will update the documentation to reflect the new method.

Thoughts from the community?

[1] https://issues.apache.org/jira/browse/NIFI-1919 <https://issues.apache.org/jira/browse/NIFI-1919>
[2] https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java>
[3] https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy>
[4] https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java>
[5] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace
[6] https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence) <https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence, java.lang.CharSequence)>


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69


Re: [discuss] Proposed addition of replaceRegex to expression language

Posted by Andy LoPresto <al...@apache.org>.
After some discussion and further thought, leaving “replace” to handle literal Strings which contain escape characters may be convenient. In this case I would suggest either renaming “replace” to “replaceAllLiteral” or the other methods to “replaceFirstRegex” and “replaceAllRegex” to illustrate the differences.


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On May 27, 2016, at 10:25 AM, Andy LoPresto <al...@apache.org> wrote:
> 
> Following up on this, I am not sure replace is needed at all. ReplaceAll, if called with a “regular expression” that is just a literal String, will perform the same operation as replace would.
> 
> In NIFI-1919, I will not remove replace, as that would break BC, but I submit that for 1.0, we should remove it, as it’s behavior is completely covered by another method. We would then have:
> 
> replaceFirst - accepts a literal or regex
> replaceAll - accepts a literal or regex
> 
> Again, we could perform “flow upgrade” through the StandardFlowSynchronizer to migrate all uses of “replace” to “replaceAll". Thoughts?
> 
> Andy LoPresto
> alopresto@apache.org <ma...@apache.org>
> alopresto.apache@gmail.com <ma...@gmail.com>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On May 26, 2016, at 10:28 AM, Joe Skora <jskora@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Got it.  That makes sense and provides all the variations of functionality
>> so it sounds like a win!
>> 
>> On Thu, May 26, 2016 at 1:21 PM, Andy LoPresto <alopresto@apache.org <ma...@apache.org>> wrote:
>> 
>>> Hi Joe,
>>> 
>>> I think what you proposed is more than is necessary. replaceAll is
>>> functionally complete already, as if it is passed a literal expression, it
>>> will compile it as such and replace all instances in the subject string. I
>>> think the confusing point is that the difference between replace and
>>> replaceAll is not the number of times the replacement is attempted, but
>>> rather the literal vs. regular expression compilation. As Joe Percivall
>>> suggested, I think three methods (replace(literal),
>>> replaceFirst(regex/literal), and replaceAll(regex/literal)) are sufficient.
>>> 
>>> Having the replaceRegex and replaceLiteral functions with an additional
>>> parameter to control occurrence count would be difficult, as we would
>>> either need to expose some kind of enum or perform string matching to
>>> interpret user-entered flag values.
>>> 
>>> I don’t think replace and replaceAll need to be EOL-ed, I think the
>>> documentation just needs to call out the behavior very explicitly, and I
>>> believe having a new option replaceFirst will also help to suggest the
>>> differences between the methods even if users do not read the entire
>>> documentation.
>>> 
>>> 
>>> Andy LoPresto
>>> alopresto@apache.org <ma...@apache.org>
>>> *alopresto.apache@gmail.com <ma...@gmail.com> <alopresto.apache@gmail.com <ma...@gmail.com>>*
>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>> 
>>> On May 26, 2016, at 10:09 AM, Joe Skora <jskora@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> I think there is transitional path that is non-breaking and could prevent
>>> 0.x to 1.x migration issues, even if at the expense of a little extra code
>>> for a period of time.
>>> 
>>>  1. Leave the existing "replace" and "replaceAll" functions as is for
>>>  now.
>>>  2. Implement the new "replaceRegex" and "replaceLiteral" functions.  I'd
>>>  prefer giving them a parameter selecting first, last, or all occurrences
>>>  over function variants, it could even be optional with all as the
>>> default.
>>>  3. In a future release deprecate the "replace" and "replaceAll"
>>>  functions.
>>>  4. In a future, future release remove the "replace" and "replaceAll"
>>> 
>>>  functions.
>>> 
>>> From past experience managing enterprise systems, users preferred a
>>> transition period like this instead of requiring everything be updated at
>>> once.  We could then provide them a means to find any outstanding use of
>>> the old functionality and clean it up before it was retired.  And from the
>>> system development and management perspective, that saved us a lot of pain
>>> too since we didn't have an influx of minor but troublesome issues in the
>>> midst transitions that could demand our attention be on other bigger
>>> problems.
>>> 
>>> It would be possible to have a signle function that takes another parameter
>>> indicating whether the target is a literal or regular expression, but I
>>> like the separate functions, especially if they will be implemented with
>>> different underlying APIs.
>>> 
>>> On Wed, May 25, 2016 at 10:43 PM, Andy LoPresto <alopresto@apache.org <ma...@apache.org>>
>>> wrote:
>>> 
>>> Joe,
>>> 
>>> These would be breaking changes and a lot of existing workflows would
>>> begin to behave differently. I would suggest making an incremental change
>>> here — simply adding replaceFirst as a non-destructive change as a solution
>>> for this issue, and opening a new Jira for the changes which break backward
>>> compatibility.
>>> 
>>> I do agree that option 1 is probably the cleaner way forward, and if we
>>> introduce new method names, we may be able to use StandardFlowSynchronizer
>>> to detect legacy methods from a pre-1.0 flow and update them automatically
>>> during flow migration.
>>> 
>>> Andy LoPresto
>>> alopresto@apache.org <ma...@apache.org>
>>> *alopresto.apache@gmail.com <ma...@gmail.com> <alopresto.apache@gmail.com <ma...@gmail.com>>*
>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>> 
>>> On May 25, 2016, at 4:06 PM, Joe Percivall <joepercivall@yahoo.com.INVALID <ma...@yahoo.com.invalid>
>>> <joepercivall@yahoo.com.invalid <ma...@yahoo.com.invalid>>
>>> <joepercivall@yahoo.com.invalid <ma...@yahoo.com.invalid>>> wrote:
>>> 
>>> Andy,
>>> 
>>> Nice write-up and thanks for bringing attention to this. I definitely
>>> assumed for a while that replace vs replaceAll was the number of things
>>> replaced. The underlying problem, I think, is that these EL methods are
>>> just wrappers around the Java String methods and the Java String methods
>>> are named in a confusing manner.
>>> 
>>> I am on board with adding a "replaceFirst(regex, replacement)" method.
>>> This adds a bit more functionality and is just exposing another Java String
>>> method.
>>> 
>>> In addition to that, I would suggest doing something to alleviate the
>>> confusion between "replace" and "replaceAll". In a similar fashion to
>>> adding decimal support, I see two avenues we could take:
>>> 
>>> 1. Change the names of the functions to "replaceLiteral" and
>>> "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex")
>>> 2. Remove one of the methods and add a third field to the remaining method
>>> to indicate whether to replace literally or interpret as a regex
>>> 
>>> Both of these would be breaking changes and introduced with 1.0 and I am
>>> leaning towards option 1 with the base name "replace". I believe when the
>>> "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would
>>> be easy to understand that they replace all occurrences.
>>> 
>>> Joe
>>> 
>>> - - - - - -
>>> Joseph Percivall
>>> linkedin.com/in/Percivall <http://linkedin.com/in/Percivall>
>>> e: joepercivall@yahoo.com <ma...@yahoo.com>
>>> 
>>> 
>>> 
>>> On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <alopresto@apache.org <ma...@apache.org>>
>>> wrote:
>>> 
>>> 
>>> 
>>> Hi all,
>>> 
>>> During investigation of an expression language issue posted to the list, I
>>> discovered that replace explicitly delegates to a String#replace invocation
>>> that only accepts literal expressions, not regular expressions, while
>>> replaceAll accepts regular expressions. I thought this was an oversight and
>>> filed NIFI-1919 [1] to document and fix this, by changing the
>>> ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular
>>> expressions. I wrote failing unit tests [3] to capture the fix. After
>>> implementing the change, two existing unit tests [4] broke, which indicated
>>> a regression. At first, I believed these two tests to be incorrect, but
>>> further investigation showed they were merely surprising.
>>> 
>>> TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call,
>>> but the test is asserting that replace should replace multiple instances of
>>> the single quote. While this is similar to String#replace, because the
>>> expression language exposes only two methods — replace vs. replaceAll — one
>>> could easily assume the difference between the two was the number of
>>> attempted replacements, rather than the actual difference, which is literal
>>> expression vs. pattern.
>>> 
>>> @Test
>>> public void testQuotingQuotes() {
>>> final Map<String, String> attributes = new HashMap<>();
>>> attributes.put("xx", "say 'hi'");
>>> 
>>> String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}";
>>> verifyEquals(query, attributes, "say \"hello\"");
>>> 
>>> query = "${xx:replace( \"'\", '\"')}";
>>> verifyEquals(query, attributes, "say \"hi\"");
>>> 
>>> query = "${xx:replace( '\\'' <smb://''>, '\"')}";
>>> System.out.println(query);
>>> verifyEquals(query, attributes, "say \"hi\"");
>>> }
>>> TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails
>>> on the first verifyEquals call with a PatternSyntaxException. I am
>>> investigating that further.
>>> 
>>> @Test
>>> public void testReplaceAllWithOddNumberOfBackslashPairs() {
>>> final Map<String, String> attributes = new HashMap<>();
>>> attributes.put("filename", "C:\\temp\\.txt");
>>> 
>>> verifyEquals("${filename:replace('\\\\', '/')}", attributes,
>>> "C:/temp/.txt");
>>> verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes,
>>> "C:/temp/.txt");
>>> verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes,
>>> "C:\\temp");
>>> }
>>> While I originally had just modified replace, after looking at the EL
>>> documentation [5], replace is explicitly documented to only replace literal
>>> expressions, and does so multiple times, as does Java’s String#replace [6].
>>> I now propose to add another method replaceFirst, which accepts a pattern
>>> and replaces only the first occurrence. I will update the unit tests to
>>> properly capture this, and will update the documentation to reflect the new
>>> method.
>>> 
>>> Thoughts from the community?
>>> 
>>> [1] https://issues.apache.org/jira/browse/NIFI-1919 <https://issues.apache.org/jira/browse/NIFI-1919>
>>> [2]
>>> 
>>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java>
>>> [3]
>>> 
>>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy>
>>> [4]
>>> 
>>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java>
>>> [5]
>>> 
>>> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace <https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace>
>>> [6]
>>> 
>>> https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence) <https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)>
>>> 
>>> 
>>> 
>>> Andy LoPresto
>>> alopresto@apache.org <ma...@apache.org>
>>> alopresto.apache@gmail.com <ma...@gmail.com>
>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 


Re: [discuss] Proposed addition of replaceRegex to expression language

Posted by Andy LoPresto <al...@apache.org>.
Following up on this, I am not sure replace is needed at all. ReplaceAll, if called with a “regular expression” that is just a literal String, will perform the same operation as replace would.

In NIFI-1919, I will not remove replace, as that would break BC, but I submit that for 1.0, we should remove it, as it’s behavior is completely covered by another method. We would then have:

replaceFirst - accepts a literal or regex
replaceAll - accepts a literal or regex

Again, we could perform “flow upgrade” through the StandardFlowSynchronizer to migrate all uses of “replace” to “replaceAll". Thoughts?

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On May 26, 2016, at 10:28 AM, Joe Skora <js...@gmail.com> wrote:
> 
> Got it.  That makes sense and provides all the variations of functionality
> so it sounds like a win!
> 
> On Thu, May 26, 2016 at 1:21 PM, Andy LoPresto <alopresto@apache.org <ma...@apache.org>> wrote:
> 
>> Hi Joe,
>> 
>> I think what you proposed is more than is necessary. replaceAll is
>> functionally complete already, as if it is passed a literal expression, it
>> will compile it as such and replace all instances in the subject string. I
>> think the confusing point is that the difference between replace and
>> replaceAll is not the number of times the replacement is attempted, but
>> rather the literal vs. regular expression compilation. As Joe Percivall
>> suggested, I think three methods (replace(literal),
>> replaceFirst(regex/literal), and replaceAll(regex/literal)) are sufficient.
>> 
>> Having the replaceRegex and replaceLiteral functions with an additional
>> parameter to control occurrence count would be difficult, as we would
>> either need to expose some kind of enum or perform string matching to
>> interpret user-entered flag values.
>> 
>> I don’t think replace and replaceAll need to be EOL-ed, I think the
>> documentation just needs to call out the behavior very explicitly, and I
>> believe having a new option replaceFirst will also help to suggest the
>> differences between the methods even if users do not read the entire
>> documentation.
>> 
>> 
>> Andy LoPresto
>> alopresto@apache.org
>> *alopresto.apache@gmail.com <ma...@gmail.com> <alopresto.apache@gmail.com <ma...@gmail.com>>*
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> 
>> On May 26, 2016, at 10:09 AM, Joe Skora <jskora@gmail.com <ma...@gmail.com>> wrote:
>> 
>> I think there is transitional path that is non-breaking and could prevent
>> 0.x to 1.x migration issues, even if at the expense of a little extra code
>> for a period of time.
>> 
>>  1. Leave the existing "replace" and "replaceAll" functions as is for
>>  now.
>>  2. Implement the new "replaceRegex" and "replaceLiteral" functions.  I'd
>>  prefer giving them a parameter selecting first, last, or all occurrences
>>  over function variants, it could even be optional with all as the
>> default.
>>  3. In a future release deprecate the "replace" and "replaceAll"
>>  functions.
>>  4. In a future, future release remove the "replace" and "replaceAll"
>> 
>>  functions.
>> 
>> From past experience managing enterprise systems, users preferred a
>> transition period like this instead of requiring everything be updated at
>> once.  We could then provide them a means to find any outstanding use of
>> the old functionality and clean it up before it was retired.  And from the
>> system development and management perspective, that saved us a lot of pain
>> too since we didn't have an influx of minor but troublesome issues in the
>> midst transitions that could demand our attention be on other bigger
>> problems.
>> 
>> It would be possible to have a signle function that takes another parameter
>> indicating whether the target is a literal or regular expression, but I
>> like the separate functions, especially if they will be implemented with
>> different underlying APIs.
>> 
>> On Wed, May 25, 2016 at 10:43 PM, Andy LoPresto <alopresto@apache.org <ma...@apache.org>>
>> wrote:
>> 
>> Joe,
>> 
>> These would be breaking changes and a lot of existing workflows would
>> begin to behave differently. I would suggest making an incremental change
>> here — simply adding replaceFirst as a non-destructive change as a solution
>> for this issue, and opening a new Jira for the changes which break backward
>> compatibility.
>> 
>> I do agree that option 1 is probably the cleaner way forward, and if we
>> introduce new method names, we may be able to use StandardFlowSynchronizer
>> to detect legacy methods from a pre-1.0 flow and update them automatically
>> during flow migration.
>> 
>> Andy LoPresto
>> alopresto@apache.org <ma...@apache.org>
>> *alopresto.apache@gmail.com <ma...@gmail.com> <alopresto.apache@gmail.com <ma...@gmail.com>>*
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> 
>> On May 25, 2016, at 4:06 PM, Joe Percivall <joepercivall@yahoo.com.INVALID <ma...@yahoo.com.invalid>
>> <joepercivall@yahoo.com.invalid <ma...@yahoo.com.invalid>>
>> <joepercivall@yahoo.com.invalid <ma...@yahoo.com.invalid>>> wrote:
>> 
>> Andy,
>> 
>> Nice write-up and thanks for bringing attention to this. I definitely
>> assumed for a while that replace vs replaceAll was the number of things
>> replaced. The underlying problem, I think, is that these EL methods are
>> just wrappers around the Java String methods and the Java String methods
>> are named in a confusing manner.
>> 
>> I am on board with adding a "replaceFirst(regex, replacement)" method.
>> This adds a bit more functionality and is just exposing another Java String
>> method.
>> 
>> In addition to that, I would suggest doing something to alleviate the
>> confusion between "replace" and "replaceAll". In a similar fashion to
>> adding decimal support, I see two avenues we could take:
>> 
>> 1. Change the names of the functions to "replaceLiteral" and
>> "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex")
>> 2. Remove one of the methods and add a third field to the remaining method
>> to indicate whether to replace literally or interpret as a regex
>> 
>> Both of these would be breaking changes and introduced with 1.0 and I am
>> leaning towards option 1 with the base name "replace". I believe when the
>> "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would
>> be easy to understand that they replace all occurrences.
>> 
>> Joe
>> 
>> - - - - - -
>> Joseph Percivall
>> linkedin.com/in/Percivall <http://linkedin.com/in/Percivall>
>> e: joepercivall@yahoo.com <ma...@yahoo.com>
>> 
>> 
>> 
>> On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <alopresto@apache.org <ma...@apache.org>>
>> wrote:
>> 
>> 
>> 
>> Hi all,
>> 
>> During investigation of an expression language issue posted to the list, I
>> discovered that replace explicitly delegates to a String#replace invocation
>> that only accepts literal expressions, not regular expressions, while
>> replaceAll accepts regular expressions. I thought this was an oversight and
>> filed NIFI-1919 [1] to document and fix this, by changing the
>> ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular
>> expressions. I wrote failing unit tests [3] to capture the fix. After
>> implementing the change, two existing unit tests [4] broke, which indicated
>> a regression. At first, I believed these two tests to be incorrect, but
>> further investigation showed they were merely surprising.
>> 
>> TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call,
>> but the test is asserting that replace should replace multiple instances of
>> the single quote. While this is similar to String#replace, because the
>> expression language exposes only two methods — replace vs. replaceAll — one
>> could easily assume the difference between the two was the number of
>> attempted replacements, rather than the actual difference, which is literal
>> expression vs. pattern.
>> 
>> @Test
>> public void testQuotingQuotes() {
>> final Map<String, String> attributes = new HashMap<>();
>> attributes.put("xx", "say 'hi'");
>> 
>> String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}";
>> verifyEquals(query, attributes, "say \"hello\"");
>> 
>> query = "${xx:replace( \"'\", '\"')}";
>> verifyEquals(query, attributes, "say \"hi\"");
>> 
>> query = "${xx:replace( '\\'' <smb://''>, '\"')}";
>> System.out.println(query);
>> verifyEquals(query, attributes, "say \"hi\"");
>> }
>> TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails
>> on the first verifyEquals call with a PatternSyntaxException. I am
>> investigating that further.
>> 
>> @Test
>> public void testReplaceAllWithOddNumberOfBackslashPairs() {
>> final Map<String, String> attributes = new HashMap<>();
>> attributes.put("filename", "C:\\temp\\.txt");
>> 
>> verifyEquals("${filename:replace('\\\\', '/')}", attributes,
>> "C:/temp/.txt");
>> verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes,
>> "C:/temp/.txt");
>> verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes,
>> "C:\\temp");
>> }
>> While I originally had just modified replace, after looking at the EL
>> documentation [5], replace is explicitly documented to only replace literal
>> expressions, and does so multiple times, as does Java’s String#replace [6].
>> I now propose to add another method replaceFirst, which accepts a pattern
>> and replaces only the first occurrence. I will update the unit tests to
>> properly capture this, and will update the documentation to reflect the new
>> method.
>> 
>> Thoughts from the community?
>> 
>> [1] https://issues.apache.org/jira/browse/NIFI-1919 <https://issues.apache.org/jira/browse/NIFI-1919>
>> [2]
>> 
>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java>
>> [3]
>> 
>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy>
>> [4]
>> 
>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java>
>> [5]
>> 
>> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace <https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace>
>> [6]
>> 
>> https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence) <https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)>
>> 
>> 
>> 
>> Andy LoPresto
>> alopresto@apache.org <ma...@apache.org>
>> alopresto.apache@gmail.com <ma...@gmail.com>
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69


Re: [discuss] Proposed addition of replaceRegex to expression language

Posted by Joe Skora <js...@gmail.com>.
Got it.  That makes sense and provides all the variations of functionality
so it sounds like a win!

On Thu, May 26, 2016 at 1:21 PM, Andy LoPresto <al...@apache.org> wrote:

> Hi Joe,
>
> I think what you proposed is more than is necessary. replaceAll is
> functionally complete already, as if it is passed a literal expression, it
> will compile it as such and replace all instances in the subject string. I
> think the confusing point is that the difference between replace and
> replaceAll is not the number of times the replacement is attempted, but
> rather the literal vs. regular expression compilation. As Joe Percivall
> suggested, I think three methods (replace(literal),
> replaceFirst(regex/literal), and replaceAll(regex/literal)) are sufficient.
>
> Having the replaceRegex and replaceLiteral functions with an additional
> parameter to control occurrence count would be difficult, as we would
> either need to expose some kind of enum or perform string matching to
> interpret user-entered flag values.
>
> I don’t think replace and replaceAll need to be EOL-ed, I think the
> documentation just needs to call out the behavior very explicitly, and I
> believe having a new option replaceFirst will also help to suggest the
> differences between the methods even if users do not read the entire
> documentation.
>
>
> Andy LoPresto
> alopresto@apache.org
> *alopresto.apache@gmail.com <al...@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On May 26, 2016, at 10:09 AM, Joe Skora <js...@gmail.com> wrote:
>
> I think there is transitional path that is non-breaking and could prevent
> 0.x to 1.x migration issues, even if at the expense of a little extra code
> for a period of time.
>
>   1. Leave the existing "replace" and "replaceAll" functions as is for
>   now.
>   2. Implement the new "replaceRegex" and "replaceLiteral" functions.  I'd
>   prefer giving them a parameter selecting first, last, or all occurrences
>   over function variants, it could even be optional with all as the
> default.
>   3. In a future release deprecate the "replace" and "replaceAll"
>   functions.
>   4. In a future, future release remove the "replace" and "replaceAll"
>
>   functions.
>
> From past experience managing enterprise systems, users preferred a
> transition period like this instead of requiring everything be updated at
> once.  We could then provide them a means to find any outstanding use of
> the old functionality and clean it up before it was retired.  And from the
> system development and management perspective, that saved us a lot of pain
> too since we didn't have an influx of minor but troublesome issues in the
> midst transitions that could demand our attention be on other bigger
> problems.
>
> It would be possible to have a signle function that takes another parameter
> indicating whether the target is a literal or regular expression, but I
> like the separate functions, especially if they will be implemented with
> different underlying APIs.
>
> On Wed, May 25, 2016 at 10:43 PM, Andy LoPresto <al...@apache.org>
> wrote:
>
> Joe,
>
> These would be breaking changes and a lot of existing workflows would
> begin to behave differently. I would suggest making an incremental change
> here — simply adding replaceFirst as a non-destructive change as a solution
> for this issue, and opening a new Jira for the changes which break backward
> compatibility.
>
> I do agree that option 1 is probably the cleaner way forward, and if we
> introduce new method names, we may be able to use StandardFlowSynchronizer
> to detect legacy methods from a pre-1.0 flow and update them automatically
> during flow migration.
>
> Andy LoPresto
> alopresto@apache.org
> *alopresto.apache@gmail.com <al...@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On May 25, 2016, at 4:06 PM, Joe Percivall <joepercivall@yahoo.com.INVALID
> <jo...@yahoo.com.invalid>
> <jo...@yahoo.com.invalid>> wrote:
>
> Andy,
>
> Nice write-up and thanks for bringing attention to this. I definitely
> assumed for a while that replace vs replaceAll was the number of things
> replaced. The underlying problem, I think, is that these EL methods are
> just wrappers around the Java String methods and the Java String methods
> are named in a confusing manner.
>
> I am on board with adding a "replaceFirst(regex, replacement)" method.
> This adds a bit more functionality and is just exposing another Java String
> method.
>
> In addition to that, I would suggest doing something to alleviate the
> confusion between "replace" and "replaceAll". In a similar fashion to
> adding decimal support, I see two avenues we could take:
>
> 1. Change the names of the functions to "replaceLiteral" and
> "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex")
> 2. Remove one of the methods and add a third field to the remaining method
> to indicate whether to replace literally or interpret as a regex
>
> Both of these would be breaking changes and introduced with 1.0 and I am
> leaning towards option 1 with the base name "replace". I believe when the
> "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would
> be easy to understand that they replace all occurrences.
>
> Joe
>
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
> On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <al...@apache.org>
> wrote:
>
>
>
> Hi all,
>
> During investigation of an expression language issue posted to the list, I
> discovered that replace explicitly delegates to a String#replace invocation
> that only accepts literal expressions, not regular expressions, while
> replaceAll accepts regular expressions. I thought this was an oversight and
> filed NIFI-1919 [1] to document and fix this, by changing the
> ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular
> expressions. I wrote failing unit tests [3] to capture the fix. After
> implementing the change, two existing unit tests [4] broke, which indicated
> a regression. At first, I believed these two tests to be incorrect, but
> further investigation showed they were merely surprising.
>
> TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call,
> but the test is asserting that replace should replace multiple instances of
> the single quote. While this is similar to String#replace, because the
> expression language exposes only two methods — replace vs. replaceAll — one
> could easily assume the difference between the two was the number of
> attempted replacements, rather than the actual difference, which is literal
> expression vs. pattern.
>
> @Test
> public void testQuotingQuotes() {
> final Map<String, String> attributes = new HashMap<>();
> attributes.put("xx", "say 'hi'");
>
> String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}";
> verifyEquals(query, attributes, "say \"hello\"");
>
> query = "${xx:replace( \"'\", '\"')}";
> verifyEquals(query, attributes, "say \"hi\"");
>
> query = "${xx:replace( '\\'', '\"')}";
> System.out.println(query);
> verifyEquals(query, attributes, "say \"hi\"");
> }
> TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails
> on the first verifyEquals call with a PatternSyntaxException. I am
> investigating that further.
>
> @Test
> public void testReplaceAllWithOddNumberOfBackslashPairs() {
> final Map<String, String> attributes = new HashMap<>();
> attributes.put("filename", "C:\\temp\\.txt");
>
> verifyEquals("${filename:replace('\\\\', '/')}", attributes,
> "C:/temp/.txt");
> verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes,
> "C:/temp/.txt");
> verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes,
> "C:\\temp");
> }
> While I originally had just modified replace, after looking at the EL
> documentation [5], replace is explicitly documented to only replace literal
> expressions, and does so multiple times, as does Java’s String#replace [6].
> I now propose to add another method replaceFirst, which accepts a pattern
> and replaces only the first occurrence. I will update the unit tests to
> properly capture this, and will update the documentation to reflect the new
> method.
>
> Thoughts from the community?
>
> [1] https://issues.apache.org/jira/browse/NIFI-1919
> [2]
>
> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java
> [3]
>
> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy
> [4]
>
> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java
> [5]
>
> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace
> [6]
>
> https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)
>
>
>
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
>
>

Re: [discuss] Proposed addition of replaceRegex to expression language

Posted by Andy LoPresto <al...@apache.org>.
Hi Joe,

I think what you proposed is more than is necessary. replaceAll is functionally complete already, as if it is passed a literal expression, it will compile it as such and replace all instances in the subject string. I think the confusing point is that the difference between replace and replaceAll is not the number of times the replacement is attempted, but rather the literal vs. regular expression compilation. As Joe Percivall suggested, I think three methods (replace(literal), replaceFirst(regex/literal), and replaceAll(regex/literal)) are sufficient.

Having the replaceRegex and replaceLiteral functions with an additional parameter to control occurrence count would be difficult, as we would either need to expose some kind of enum or perform string matching to interpret user-entered flag values.

I don’t think replace and replaceAll need to be EOL-ed, I think the documentation just needs to call out the behavior very explicitly, and I believe having a new option replaceFirst will also help to suggest the differences between the methods even if users do not read the entire documentation.


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On May 26, 2016, at 10:09 AM, Joe Skora <js...@gmail.com> wrote:
> 
> I think there is transitional path that is non-breaking and could prevent
> 0.x to 1.x migration issues, even if at the expense of a little extra code
> for a period of time.
> 
>   1. Leave the existing "replace" and "replaceAll" functions as is for
>   now.
>   2. Implement the new "replaceRegex" and "replaceLiteral" functions.  I'd
>   prefer giving them a parameter selecting first, last, or all occurrences
>   over function variants, it could even be optional with all as the default.
>   3. In a future release deprecate the "replace" and "replaceAll"
>   functions.
>   4. In a future, future release remove the "replace" and "replaceAll"
>   functions.
> 
> From past experience managing enterprise systems, users preferred a
> transition period like this instead of requiring everything be updated at
> once.  We could then provide them a means to find any outstanding use of
> the old functionality and clean it up before it was retired.  And from the
> system development and management perspective, that saved us a lot of pain
> too since we didn't have an influx of minor but troublesome issues in the
> midst transitions that could demand our attention be on other bigger
> problems.
> 
> It would be possible to have a signle function that takes another parameter
> indicating whether the target is a literal or regular expression, but I
> like the separate functions, especially if they will be implemented with
> different underlying APIs.
> 
> On Wed, May 25, 2016 at 10:43 PM, Andy LoPresto <alopresto@apache.org <ma...@apache.org>>
> wrote:
> 
>> Joe,
>> 
>> These would be breaking changes and a lot of existing workflows would
>> begin to behave differently. I would suggest making an incremental change
>> here — simply adding replaceFirst as a non-destructive change as a solution
>> for this issue, and opening a new Jira for the changes which break backward
>> compatibility.
>> 
>> I do agree that option 1 is probably the cleaner way forward, and if we
>> introduce new method names, we may be able to use StandardFlowSynchronizer
>> to detect legacy methods from a pre-1.0 flow and update them automatically
>> during flow migration.
>> 
>> Andy LoPresto
>> alopresto@apache.org
>> *alopresto.apache@gmail.com <ma...@gmail.com> <alopresto.apache@gmail.com <ma...@gmail.com>>*
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> 
>> On May 25, 2016, at 4:06 PM, Joe Percivall <joepercivall@yahoo.com.INVALID <ma...@yahoo.com.invalid>
>> <joepercivall@yahoo.com.invalid <ma...@yahoo.com.invalid>>> wrote:
>> 
>> Andy,
>> 
>> Nice write-up and thanks for bringing attention to this. I definitely
>> assumed for a while that replace vs replaceAll was the number of things
>> replaced. The underlying problem, I think, is that these EL methods are
>> just wrappers around the Java String methods and the Java String methods
>> are named in a confusing manner.
>> 
>> I am on board with adding a "replaceFirst(regex, replacement)" method.
>> This adds a bit more functionality and is just exposing another Java String
>> method.
>> 
>> In addition to that, I would suggest doing something to alleviate the
>> confusion between "replace" and "replaceAll". In a similar fashion to
>> adding decimal support, I see two avenues we could take:
>> 
>> 1. Change the names of the functions to "replaceLiteral" and
>> "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex")
>> 2. Remove one of the methods and add a third field to the remaining method
>> to indicate whether to replace literally or interpret as a regex
>> 
>> Both of these would be breaking changes and introduced with 1.0 and I am
>> leaning towards option 1 with the base name "replace". I believe when the
>> "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would
>> be easy to understand that they replace all occurrences.
>> 
>> Joe
>> 
>> - - - - - -
>> Joseph Percivall
>> linkedin.com/in/Percivall <http://linkedin.com/in/Percivall>
>> e: joepercivall@yahoo.com <ma...@yahoo.com>
>> 
>> 
>> 
>> On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <alopresto@apache.org <ma...@apache.org>>
>> wrote:
>> 
>> 
>> 
>> Hi all,
>> 
>> During investigation of an expression language issue posted to the list, I
>> discovered that replace explicitly delegates to a String#replace invocation
>> that only accepts literal expressions, not regular expressions, while
>> replaceAll accepts regular expressions. I thought this was an oversight and
>> filed NIFI-1919 [1] to document and fix this, by changing the
>> ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular
>> expressions. I wrote failing unit tests [3] to capture the fix. After
>> implementing the change, two existing unit tests [4] broke, which indicated
>> a regression. At first, I believed these two tests to be incorrect, but
>> further investigation showed they were merely surprising.
>> 
>> TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call,
>> but the test is asserting that replace should replace multiple instances of
>> the single quote. While this is similar to String#replace, because the
>> expression language exposes only two methods — replace vs. replaceAll — one
>> could easily assume the difference between the two was the number of
>> attempted replacements, rather than the actual difference, which is literal
>> expression vs. pattern.
>> 
>> @Test
>> public void testQuotingQuotes() {
>> final Map<String, String> attributes = new HashMap<>();
>> attributes.put("xx", "say 'hi'");
>> 
>> String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}";
>> verifyEquals(query, attributes, "say \"hello\"");
>> 
>> query = "${xx:replace( \"'\", '\"')}";
>> verifyEquals(query, attributes, "say \"hi\"");
>> 
>> query = "${xx:replace( '\\'' <smb://''>, '\"')}";
>> System.out.println(query);
>> verifyEquals(query, attributes, "say \"hi\"");
>> }
>> TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails
>> on the first verifyEquals call with a PatternSyntaxException. I am
>> investigating that further.
>> 
>> @Test
>> public void testReplaceAllWithOddNumberOfBackslashPairs() {
>> final Map<String, String> attributes = new HashMap<>();
>> attributes.put("filename", "C:\\temp\\.txt");
>> 
>> verifyEquals("${filename:replace('\\\\', '/')}", attributes,
>> "C:/temp/.txt");
>> verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes,
>> "C:/temp/.txt");
>> verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes,
>> "C:\\temp");
>> }
>> While I originally had just modified replace, after looking at the EL
>> documentation [5], replace is explicitly documented to only replace literal
>> expressions, and does so multiple times, as does Java’s String#replace [6].
>> I now propose to add another method replaceFirst, which accepts a pattern
>> and replaces only the first occurrence. I will update the unit tests to
>> properly capture this, and will update the documentation to reflect the new
>> method.
>> 
>> Thoughts from the community?
>> 
>> [1] https://issues.apache.org/jira/browse/NIFI-1919 <https://issues.apache.org/jira/browse/NIFI-1919>
>> [2]
>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java>
>> [3]
>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy>
>> [4]
>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java>
>> [5]
>> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace <https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace>
>> [6]
>> https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence) <https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)>
>> 
>> 
>> 
>> Andy LoPresto
>> alopresto@apache.org <ma...@apache.org>
>> alopresto.apache@gmail.com <ma...@gmail.com>
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69


Re: [discuss] Proposed addition of replaceRegex to expression language

Posted by Joe Skora <js...@gmail.com>.
I think there is transitional path that is non-breaking and could prevent
0.x to 1.x migration issues, even if at the expense of a little extra code
for a period of time.

   1. Leave the existing "replace" and "replaceAll" functions as is for
   now.
   2. Implement the new "replaceRegex" and "replaceLiteral" functions.  I'd
   prefer giving them a parameter selecting first, last, or all occurrences
   over function variants, it could even be optional with all as the default.
   3. In a future release deprecate the "replace" and "replaceAll"
   functions.
   4. In a future, future release remove the "replace" and "replaceAll"
   functions.

From past experience managing enterprise systems, users preferred a
transition period like this instead of requiring everything be updated at
once.  We could then provide them a means to find any outstanding use of
the old functionality and clean it up before it was retired.  And from the
system development and management perspective, that saved us a lot of pain
too since we didn't have an influx of minor but troublesome issues in the
midst transitions that could demand our attention be on other bigger
problems.

It would be possible to have a signle function that takes another parameter
indicating whether the target is a literal or regular expression, but I
like the separate functions, especially if they will be implemented with
different underlying APIs.

On Wed, May 25, 2016 at 10:43 PM, Andy LoPresto <al...@apache.org>
wrote:

> Joe,
>
> These would be breaking changes and a lot of existing workflows would
> begin to behave differently. I would suggest making an incremental change
> here — simply adding replaceFirst as a non-destructive change as a solution
> for this issue, and opening a new Jira for the changes which break backward
> compatibility.
>
> I do agree that option 1 is probably the cleaner way forward, and if we
> introduce new method names, we may be able to use StandardFlowSynchronizer
> to detect legacy methods from a pre-1.0 flow and update them automatically
> during flow migration.
>
> Andy LoPresto
> alopresto@apache.org
> *alopresto.apache@gmail.com <al...@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On May 25, 2016, at 4:06 PM, Joe Percivall <joepercivall@yahoo.com.INVALID
> <jo...@yahoo.com.invalid>> wrote:
>
> Andy,
>
> Nice write-up and thanks for bringing attention to this. I definitely
> assumed for a while that replace vs replaceAll was the number of things
> replaced. The underlying problem, I think, is that these EL methods are
> just wrappers around the Java String methods and the Java String methods
> are named in a confusing manner.
>
> I am on board with adding a "replaceFirst(regex, replacement)" method.
> This adds a bit more functionality and is just exposing another Java String
> method.
>
> In addition to that, I would suggest doing something to alleviate the
> confusion between "replace" and "replaceAll". In a similar fashion to
> adding decimal support, I see two avenues we could take:
>
> 1. Change the names of the functions to "replaceLiteral" and
> "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex")
> 2. Remove one of the methods and add a third field to the remaining method
> to indicate whether to replace literally or interpret as a regex
>
> Both of these would be breaking changes and introduced with 1.0 and I am
> leaning towards option 1 with the base name "replace". I believe when the
> "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would
> be easy to understand that they replace all occurrences.
>
> Joe
>
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
> On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <al...@apache.org>
> wrote:
>
>
>
> Hi all,
>
> During investigation of an expression language issue posted to the list, I
> discovered that replace explicitly delegates to a String#replace invocation
> that only accepts literal expressions, not regular expressions, while
> replaceAll accepts regular expressions. I thought this was an oversight and
> filed NIFI-1919 [1] to document and fix this, by changing the
> ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular
> expressions. I wrote failing unit tests [3] to capture the fix. After
> implementing the change, two existing unit tests [4] broke, which indicated
> a regression. At first, I believed these two tests to be incorrect, but
> further investigation showed they were merely surprising.
>
> TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call,
> but the test is asserting that replace should replace multiple instances of
> the single quote. While this is similar to String#replace, because the
> expression language exposes only two methods — replace vs. replaceAll — one
> could easily assume the difference between the two was the number of
> attempted replacements, rather than the actual difference, which is literal
> expression vs. pattern.
>
> @Test
> public void testQuotingQuotes() {
> final Map<String, String> attributes = new HashMap<>();
> attributes.put("xx", "say 'hi'");
>
> String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}";
> verifyEquals(query, attributes, "say \"hello\"");
>
> query = "${xx:replace( \"'\", '\"')}";
> verifyEquals(query, attributes, "say \"hi\"");
>
> query = "${xx:replace( '\\'', '\"')}";
> System.out.println(query);
> verifyEquals(query, attributes, "say \"hi\"");
> }
> TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails
> on the first verifyEquals call with a PatternSyntaxException. I am
> investigating that further.
>
> @Test
> public void testReplaceAllWithOddNumberOfBackslashPairs() {
> final Map<String, String> attributes = new HashMap<>();
> attributes.put("filename", "C:\\temp\\.txt");
>
> verifyEquals("${filename:replace('\\\\', '/')}", attributes,
> "C:/temp/.txt");
> verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes,
> "C:/temp/.txt");
> verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes,
> "C:\\temp");
> }
> While I originally had just modified replace, after looking at the EL
> documentation [5], replace is explicitly documented to only replace literal
> expressions, and does so multiple times, as does Java’s String#replace [6].
> I now propose to add another method replaceFirst, which accepts a pattern
> and replaces only the first occurrence. I will update the unit tests to
> properly capture this, and will update the documentation to reflect the new
> method.
>
> Thoughts from the community?
>
> [1] https://issues.apache.org/jira/browse/NIFI-1919
> [2]
> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java
> [3]
> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy
> [4]
> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java
> [5]
> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace
> [6]
> https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)
>
>
>
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
>
>

Re: [discuss] Proposed addition of replaceRegex to expression language

Posted by Andy LoPresto <al...@apache.org>.
Joe,

These would be breaking changes and a lot of existing workflows would begin to behave differently. I would suggest making an incremental change here — simply adding replaceFirst as a non-destructive change as a solution for this issue, and opening a new Jira for the changes which break backward compatibility.

I do agree that option 1 is probably the cleaner way forward, and if we introduce new method names, we may be able to use StandardFlowSynchronizer to detect legacy methods from a pre-1.0 flow and update them automatically during flow migration.

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On May 25, 2016, at 4:06 PM, Joe Percivall <jo...@yahoo.com.INVALID> wrote:
> 
> Andy,
> 
> Nice write-up and thanks for bringing attention to this. I definitely assumed for a while that replace vs replaceAll was the number of things replaced. The underlying problem, I think, is that these EL methods are just wrappers around the Java String methods and the Java String methods are named in a confusing manner.
> 
> I am on board with adding a "replaceFirst(regex, replacement)" method. This adds a bit more functionality and is just exposing another Java String method.
> 
> In addition to that, I would suggest doing something to alleviate the confusion between "replace" and "replaceAll". In a similar fashion to adding decimal support, I see two avenues we could take:
> 
> 1. Change the names of the functions to "replaceLiteral" and "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex")
> 2. Remove one of the methods and add a third field to the remaining method to indicate whether to replace literally or interpret as a regex
> 
> Both of these would be breaking changes and introduced with 1.0 and I am leaning towards option 1 with the base name "replace". I believe when the "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would be easy to understand that they replace all occurrences.
> 
> Joe
> 
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
> 
> 
> 
> On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <al...@apache.org> wrote:
> 
> 
> 
> Hi all,
> 
> During investigation of an expression language issue posted to the list, I discovered that replace explicitly delegates to a String#replace invocation that only accepts literal expressions, not regular expressions, while replaceAll accepts regular expressions. I thought this was an oversight and filed NIFI-1919 [1] to document and fix this, by changing the ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular expressions. I wrote failing unit tests [3] to capture the fix. After implementing the change, two existing unit tests [4] broke, which indicated a regression. At first, I believed these two tests to be incorrect, but further investigation showed they were merely surprising.
> 
> TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call, but the test is asserting that replace should replace multiple instances of the single quote. While this is similar to String#replace, because the expression language exposes only two methods — replace vs. replaceAll — one could easily assume the difference between the two was the number of attempted replacements, rather than the actual difference, which is literal expression vs. pattern.
> 
> @Test
> public void testQuotingQuotes() {
> final Map<String, String> attributes = new HashMap<>();
> attributes.put("xx", "say 'hi'");
> 
> String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}";
> verifyEquals(query, attributes, "say \"hello\"");
> 
> query = "${xx:replace( \"'\", '\"')}";
> verifyEquals(query, attributes, "say \"hi\"");
> 
> query = "${xx:replace( '\\'', '\"')}";
> System.out.println(query);
> verifyEquals(query, attributes, "say \"hi\"");
> }
> TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails on the first verifyEquals call with a PatternSyntaxException. I am investigating that further.
> 
> @Test
> public void testReplaceAllWithOddNumberOfBackslashPairs() {
> final Map<String, String> attributes = new HashMap<>();
> attributes.put("filename", "C:\\temp\\.txt");
> 
> verifyEquals("${filename:replace('\\\\', '/')}", attributes, "C:/temp/.txt");
> verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes, "C:/temp/.txt");
> verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes, "C:\\temp");
> }
> While I originally had just modified replace, after looking at the EL documentation [5], replace is explicitly documented to only replace literal expressions, and does so multiple times, as does Java’s String#replace [6]. I now propose to add another method replaceFirst, which accepts a pattern and replaces only the first occurrence. I will update the unit tests to properly capture this, and will update the documentation to reflect the new method.
> 
> Thoughts from the community?
> 
> [1] https://issues.apache.org/jira/browse/NIFI-1919
> [2] https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java
> [3] https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy
> [4] https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java
> [5] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace
> [6] https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)
> 
> 
> 
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69


Re: [discuss] Proposed addition of replaceRegex to expression language

Posted by Joe Percivall <jo...@yahoo.com.INVALID>.
Andy,

Nice write-up and thanks for bringing attention to this. I definitely assumed for a while that replace vs replaceAll was the number of things replaced. The underlying problem, I think, is that these EL methods are just wrappers around the Java String methods and the Java String methods are named in a confusing manner. 
 
I am on board with adding a "replaceFirst(regex, replacement)" method. This adds a bit more functionality and is just exposing another Java String method.

In addition to that, I would suggest doing something to alleviate the confusion between "replace" and "replaceAll". In a similar fashion to adding decimal support, I see two avenues we could take:

1. Change the names of the functions to "replaceLiteral" and "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex")
2. Remove one of the methods and add a third field to the remaining method to indicate whether to replace literally or interpret as a regex

Both of these would be breaking changes and introduced with 1.0 and I am leaning towards option 1 with the base name "replace". I believe when the "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would be easy to understand that they replace all occurrences.

Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com



On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <al...@apache.org> wrote:



Hi all,

During investigation of an expression language issue posted to the list, I discovered that replace explicitly delegates to a String#replace invocation that only accepts literal expressions, not regular expressions, while replaceAll accepts regular expressions. I thought this was an oversight and filed NIFI-1919 [1] to document and fix this, by changing the ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular expressions. I wrote failing unit tests [3] to capture the fix. After implementing the change, two existing unit tests [4] broke, which indicated a regression. At first, I believed these two tests to be incorrect, but further investigation showed they were merely surprising. 

TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call, but the test is asserting that replace should replace multiple instances of the single quote. While this is similar to String#replace, because the expression language exposes only two methods — replace vs. replaceAll — one could easily assume the difference between the two was the number of attempted replacements, rather than the actual difference, which is literal expression vs. pattern. 

@Test
public void testQuotingQuotes() {
final Map<String, String> attributes = new HashMap<>();
attributes.put("xx", "say 'hi'");

String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}";
verifyEquals(query, attributes, "say \"hello\"");

query = "${xx:replace( \"'\", '\"')}";
verifyEquals(query, attributes, "say \"hi\"");

query = "${xx:replace( '\\'', '\"')}";
System.out.println(query);
verifyEquals(query, attributes, "say \"hi\"");
}
TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails on the first verifyEquals call with a PatternSyntaxException. I am investigating that further.

@Test
public void testReplaceAllWithOddNumberOfBackslashPairs() {
final Map<String, String> attributes = new HashMap<>();
attributes.put("filename", "C:\\temp\\.txt");

verifyEquals("${filename:replace('\\\\', '/')}", attributes, "C:/temp/.txt");
verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes, "C:/temp/.txt");
verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes, "C:\\temp");
}
While I originally had just modified replace, after looking at the EL documentation [5], replace is explicitly documented to only replace literal expressions, and does so multiple times, as does Java’s String#replace [6]. I now propose to add another method replaceFirst, which accepts a pattern and replaces only the first occurrence. I will update the unit tests to properly capture this, and will update the documentation to reflect the new method. 

Thoughts from the community?

[1] https://issues.apache.org/jira/browse/NIFI-1919
[2] https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java
[3] https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy
[4] https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java
[5] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace
[6] https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)



Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69