You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Joe Percivall <jo...@yahoo.com.INVALID> on 2016/04/21 04:16:26 UTC

[Discuss] Expression Language Options to Add Decimal Support

Hello Dev list,
I've been working on a couple improvements that deal with utilizing expression language to do data analysis and this has exposed a couple issues with the typing of numbers in Expression Language (EL). The improvements are a continuation of the topic I presented on at the Apache NiFi MeetUp group in MD[1]. 
The primary issue I came across is that currently in EL all numbers interpreted as longs[2], which only store whole numbers. This leads to problems when trying to do things like dividing whole numbers or just trying to add/subtract decimals. I am actually surprised that the request for decimals this hasn't come up before. That being said, after some initial discussion with Tony and Joe, I believe that there are four potential ways forward.
1: Create a new EL type "decimal" backed by a double[3] and new methods to support it ("add_decimal"): This allows the user to explicitly choose whether or not they want to use decimal or whole numbers. It retains the simple use-cases that use whole numbers while opening up the new use-cases using decimals. One down side is that it is more verbose. It means adding a new function for each math operation. This is backwards compatible.

2: Back all numbers by doubles instead of longs: The easy to implement and retains very concise methods ("add", "subtract", etc..). A few cons, doubles have a lower precision than longs[4],  can lead to rounding errors[5] and could be confusing for users who only work with whole numbers to suddenly find decimals.This is not backwards compatible.
3: Create a new EL type "decimal" back by a double[3] and attempt to smartly automatically cast depending on method/input/output: This would allow for the positives of having decimals and whole numbers in addition to having concise methods. The main cons being a much longer implementation time to do it right and the "shadiness" of doing things automatically for the user. Also this would mean the user wouldn't have the option to explicitly override  This is not backwards compatible.
4: Create a new EL type "decimal" backed by a double[3] and overload the existing methods with an additional parameter to specify return type to support it: This would allow for the positives of having decimals and whole numbers in addition to having concise method names but this may cause confusion with less technical users who aren't used to specifying return types. This is backwards compatible.

The options that are not backwards compatible would need to wait to be implemented until 1.0.
The current option I am leaning towards is number 1 due to the explicitness and greater control it gives the user. While it is more verbose I think the decimal vs whole number syntax will be easy for even non-technical users to pick up. Also I currently have a PR for it up here[6].
Any other ideas or suggestions are welcome!
[1] http://www.meetup.com/ApacheNiFi/events/229158777/
[2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
[3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
[4] https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
[5] http://stackoverflow.com/questions/960072/rounding-errors
[6] https://issues.apache.org/jira/browse/NIFI-1662
Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joepercivall@yahoo.com

Re: [Discuss] Expression Language Options to Add Decimal Support

Posted by Joe Percivall <jo...@yahoo.com.INVALID>.
The only time the regex the Regex would be applied to a FlowFile attribute or escaped string is when it's passed to a function that expects a number. If we decide to allow literal decimals in expression language (${value:plus(3.2)} instead of ${value:plus("3.2")}) then a double regex would be used by the EL parser to evaluate the expression. I looked for un-quoted literal dots and I don't see any in any EL examples in the documentation. The replace example does have a dot but it is within a String. 

Mark, can you comment on how you designed EL to use dots and why they are classified as their own literal?

Andy, that is a good point about users just not knowing. The best we can do though is to make sure to call it out as a new feature in the release notes and, like you said, have bullet-proof documentation.


- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com



On Monday, April 25, 2016 11:29 PM, Andy LoPresto <al...@gmail.com> wrote:



My question about auto-casting was when the regex would be applied -- after every function execution or just ones that return a numeric value. I think you answered that only the second would apply. Also, the documentation uses "." in a String replacement example without any escaping (can't find link now as I am on mobile but I believe it's on the Expression Language Guide page under "replace"). This supports my point that our documentation must be bullet-proof for people who are not already developers and have had to deal with integer/decimal math operations or escaped characters on a regular basis. 

As a developer, I agree with prioritizing dev-usability and completeness of features over ease of adoption for general users. However, I do think as NiFi adoption grows, the percentage of our users who are non-developers will grow substantially, perhaps eventually becoming a majority. 

I'm also not sure that we can safely assume that all users are aware that currently, only integer operations are supported. Unless they read the documentation carefully or manually inspect/unit test their flows before deploying, I can see users existing who believe they are performing decimal division (having used variable replacement and never seen actual value substitution) and will be surprised by this revelation when it is announced in release notes (or perhaps will never notice a change). 

Andy LoPresto
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69


> On Apr 25, 2016, at 19:35, Joe Percivall <jo...@yahoo.com.INVALID> wrote:
> 
> Thanks for going through each of your concerns Andy and I will address each:
> 
> 
> 1: I agree that option 3 would be most concise and would address this loss of precision.
> 2: There are a couple points here:
>        1. Using decimal points: currently decimal points are reserved characters in expression language[1] but I'm not sure that they are used anywhere (I only see two matches for "DOT", in the lexer and parser). I'm not entirely sure how prioritization would work when adding the Regex to match a decimal point for a double. In the PR I have open[2]  (uses option 1 and will be closed) I explicitly require quoting the decimal to allow easier parsing. Also of note, you can only pass a quoted String or attribute name as the expression value subject (the thing getting a method called on it). So “${1.2:toNumber()}” would not be valid but “${"1.2":toNumber()}” would be.
>        2. Evaluation of String which would parse to a double: When you run "${“1-2”:replace(“-“, “.”)}” you merely create a String with the contents "1.2". This is because the type only changes after being run through a method which returns a specific type.
>        3. Long Term users: I agree that long term users would need to adjust their thinking but for a different reason. They have only ever used/had access to using whole numbers. This change would require users to change their thinking to think of new use-cases. If they don't want to adjust then they can continue only using whole numbers the exact same way they always have.
>        4: User acceptance: I agree that specifying input type to get a specific output type will be something new to non-dev users and would require good documentation with good examples. That being said I think the added learning curve is worth the added functionality. 
> 
> [1] https://github.com/apache/nifi/blob/32398693d5917e35d3a6cd1549edd795e16fa8ec/nifi-commons/nifi-expression-language/src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionLexer.g#L80-L80
> [2] https://github.com/apache/nifi/pull/310
> 
> Joe
> 
> - - - - - - 
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
> 
> 
> 
> On Monday, April 25, 2016 9:31 PM, Andy LoPresto <al...@apache.org> wrote:
> 
> 
> 
> Mark made some good points in his support of option 3. I’ll try to list some of the scenarios I’m worried about, and if we can address each, I’m fine with that choice. For the rest of this message, “user” refers to the person writing the EL expression. They may or may not be a developer, but have read our existing EL documentation. 
> 
> 1. Loss of precision — currently there is an issue where division of Long values can result in a loss of precision. I have provided a failing test here [1]. I believe this would be resolved by any of the proposed solutions, but the auto-infer mechanism of option 3 would be the most concise. 
> 2. Auto-casting — I’m not sure how the “auto-interpretation” would occur. Would it only happen on the result of mathematical operations? Would “${1.2:toNumber()}” return “1” as a Long or “1.2” as a Decimal? What about “${“1-2”:replace(“-“, “.”)}”? Is this a String with contents “1.2” or a Number (either Long or double)? The current “single numeric type” language has (I imagine) conditioned users to associate any numerical value with the Number type. I believe splitting decimal values out to “Decimal” type is correct, but will require very clear documentation and probably a concerted education effort for long-time users. As Joe pointed out, if a user wanted a decimal result of division, they would need to specify a decimal dividend or divisor, rather than operate on the quotient. This is accepted practice by developers, but is unlikely to be familiar with non-developers (i.e. writing “${“10”:divide(“3”):toDecimal()}” would result in “3”; “${“10”:divide(“3”:toDecimal())}” or ${“10”:toDecimal():divide(“3”)}” are necessary to get a result of “3.33…”)
> 
> I think as long as the user experience and concise and explicit documentation is held in paramount, I am ok with option 3. I just see the most potential for user confusion there, especially after some recent mailing list questions about the expression language functions and documentation. 
> 
> [1] https://github.com/alopresto/nifi/blob/el_double/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy#L108
> 
> 
> 
> 
> 
> 
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Apr 25, 2016, at 7:22 AM, Pierre Villard <pi...@gmail.com> wrote:
>> 
>> With explanations from Mark, option 3 is fine with me especially as it is
>> backward compatible.
>> 
>> Pierre
>> 
>> 2016-04-25 16:14 GMT+02:00 Joe Percivall <jo...@yahoo.com.invalid>:
>> 
>> 
>> Pierre and Andy,
>>> 
>>> How do you guys feel about option 3 as Mark has explained it? Keeping in
>>> mind that it would be backwards compatible. As it stands we have differing
>>> opinions, which is a good thing, but we need to reach an agreement in order
>>> to proceed.
>>> 
>>> Joe
>>> - - - - - -
>>> Joseph Percivall
>>> linkedin.com/in/Percivall
>>> e: joepercivall@yahoo.com
>>> 
>>> 
>>> 
>>> 
>>> On Thursday, April 21, 2016 11:12 AM, Joe Percivall
>>> <jo...@yahoo.com.INVALID> wrote:
>>> Mark,
>>> 
>>> 
>>> After thinking through this a bit more I believe you're right. We can
>>> assume this logic for every operation:
>>> 
>>> Two whole numbers as input returns a whole number
>>> A whole number and a decimal as input returns a decimal
>>> Two decimals as input returns a Decimal
>>> 
>>> I believe, one problem with your explanation as written is "...infer the
>>> return type and allow the user to explicitly change it via toNumber() and
>>> toDecimal() calls if need be". More specifically you would need to call
>>> toNumber/toDecimal on the inputs in order to change the return type. You
>>> can't change the return type after the expression runs (without the
>>> potential loss of information/precision of converting between the two)
>>> because once the expression evaluator exits it needs to have already
>>> interpreted it as a long or double which would occur before the user is
>>> able to call toNumber/toDecimal. So the user would have to control the
>>> return type would by calling toNumber/toDecimal on the inputs.
>>> 
>>> That leads to the one confusing bit about this approach, when a user
>>> attempts to do an operation on two whole numbers that they would expect to
>>> return a decimal (like 8/5 = 1.6). In order to get their expected output
>>> the user would have to know to explicitly convert at least one of them to a
>>> double by using the toDecimal operation.
>>> 
>>> That being said it's probably that's worth it to keep down the verbosity
>>> and still have backwards compatibility.
>>> 
>>> 
>>> Joe
>>> - - - - - - Joseph Percivall
>>> linkedin.com/in/Percivall
>>> e: joepercivall@yahoo.com
>>> 
>>> 
>>> 
>>> 
>>> On Thursday, April 21, 2016 10:20 AM, Mark Payne <ma...@hotmail.com>
>>> wrote:
>>> 
>>> 
>>> 
>>> Joe,
>>> 
>>> I am definitely in favor of #3. I think if we perform some sort of binary
>>> operation on two numbers,
>>> we should return a Decimal type if either of the operands is a Decimal and
>>> a Number (Long) type
>>> if both operands are Longs. We would also need a toDecimal() method that
>>> would convert a Number
>>> type ot a Decimal type and we would need to support calling toNumber() on
>>> a decimal as well.
>>> 
>>> Given this, though, I think it makes sense to infer the return type and
>>> allow the user to explicitly
>>> change it via toNumber() and toDecimal() calls if need be. With the
>>> addition of these functions, I don't
>>> think it would be 'shady' to interpret the types at all. We already do
>>> interpret types in many cases, for
>>> instance when we have an expression such as ${num1:divide(${num2}) } -- in
>>> this case, num1 and num2
>>> are attributes, so they are strings. We already implicitly convert them to
>>> numbers. Going forward, we should
>>> convert them to either Number or Decimal type based on the regex that they
>>> match. I believe this also
>>> addresses the concern of not being able to override. I don't think
>>> backward compatibility is an issue here
>>> either, as we currently would just fail and throw an Exception if
>>> attempting to divide a string like "2.3"
>>> 
>>> Number 3 seems like a slam-dunk to me in terms of pros vs cons.
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> 
>>> 
>>> 
>>> On Apr 20, 2016, at 10:16 PM, Joe Percivall
>>>> <jo...@yahoo.com.INVALID> wrote:
>>> 
>>> 
>>>> Hello Dev list,
>>>> I've been working on a couple improvements that deal with utilizing
>>>> expression language to do data analysis and this has exposed a couple
>>> issues with the typing of numbers in Expression Language (EL). The
>>> improvements are a continuation of the topic I presented on at the Apache
>>> NiFi MeetUp group in MD[1].
>>> 
>>> The primary issue I came across is that currently in EL all numbers
>>>> interpreted as longs[2], which only store whole numbers. This leads to
>>> problems when trying to do things like dividing whole numbers or just
>>> trying to add/subtract decimals. I am actually surprised that the request
>>> for decimals this hasn't come up before. That being said, after some
>>> initial discussion with Tony and Joe, I believe that there are four
>>> potential ways forward.
>>> 
>>> 1: Create a new EL type "decimal" backed by a double[3] and new methods
>>>> to support it ("add_decimal"): This allows the user to explicitly choose
>>> whether or not they want to use decimal or whole numbers. It retains the
>>> simple use-cases that use whole numbers while opening up the new use-cases
>>> using decimals. One down side is that it is more verbose. It means adding a
>>> new function for each math operation. This is backwards compatible.
>>> 
>>> 
>>>> 2: Back all numbers by doubles instead of longs: The easy to implement
>>>> and retains very concise methods ("add", "subtract", etc..). A few cons,
>>> doubles have a lower precision than longs[4],  can lead to rounding
>>> errors[5] and could be confusing for users who only work with whole numbers
>>> to suddenly find decimals.This is not backwards compatible.
>>> 
>>> 3: Create a new EL type "decimal" back by a double[3] and attempt to
>>>> smartly automatically cast depending on method/input/output: This would
>>> allow for the positives of having decimals and whole numbers in addition to
>>> having concise methods. The main cons being a much longer implementation
>>> time to do it right and the "shadiness" of doing things automatically for
>>> the user. Also this would mean the user wouldn't have the option to
>>> explicitly override  This is not backwards compatible.
>>> 
>>> 4: Create a new EL type "decimal" backed by a double[3] and overload the
>>>> existing methods with an additional parameter to specify return type to
>>> support it: This would allow for the positives of having decimals and whole
>>> numbers in addition to having concise method names but this may cause
>>> confusion with less technical users who aren't used to specifying return
>>> types. This is backwards compatible.
>>> 
>>> 
>>>> The options that are not backwards compatible would need to wait to be
>>>> implemented until 1.0.
>>> 
>>> The current option I am leaning towards is number 1 due to the
>>>> explicitness and greater control it gives the user. While it is more
>>> verbose I think the decimal vs whole number syntax will be easy for even
>>> non-technical users to pick up. Also I currently have a PR for it up
>>> here[6].
>>> 
>>> Any other ideas or suggestions are welcome!
>>>> [1] http://www.meetup.com/ApacheNiFi/events/229158777/
>>>> [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
>>>> [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
>>>> [4]
>>>> https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
>>> 
>>> [5] http://stackoverflow.com/questions/960072/rounding-errors
>>>> [6] https://issues.apache.org/jira/browse/NIFI-1662
>>>> Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle:
>>>> joepercivall@yahoo.com
>>> 
>>>

Re: [Discuss] Expression Language Options to Add Decimal Support

Posted by Andy LoPresto <al...@gmail.com>.
My question about auto-casting was when the regex would be applied -- after every function execution or just ones that return a numeric value. I think you answered that only the second would apply. Also, the documentation uses "." in a String replacement example without any escaping (can't find link now as I am on mobile but I believe it's on the Expression Language Guide page under "replace"). This supports my point that our documentation must be bullet-proof for people who are not already developers and have had to deal with integer/decimal math operations or escaped characters on a regular basis. 

As a developer, I agree with prioritizing dev-usability and completeness of features over ease of adoption for general users. However, I do think as NiFi adoption grows, the percentage of our users who are non-developers will grow substantially, perhaps eventually becoming a majority. 

I'm also not sure that we can safely assume that all users are aware that currently, only integer operations are supported. Unless they read the documentation carefully or manually inspect/unit test their flows before deploying, I can see users existing who believe they are performing decimal division (having used variable replacement and never seen actual value substitution) and will be surprised by this revelation when it is announced in release notes (or perhaps will never notice a change). 

Andy LoPresto
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Apr 25, 2016, at 19:35, Joe Percivall <jo...@yahoo.com.INVALID> wrote:
> 
> Thanks for going through each of your concerns Andy and I will address each:
> 
> 
> 1: I agree that option 3 would be most concise and would address this loss of precision.
> 2: There are a couple points here:
>        1. Using decimal points: currently decimal points are reserved characters in expression language[1] but I'm not sure that they are used anywhere (I only see two matches for "DOT", in the lexer and parser). I'm not entirely sure how prioritization would work when adding the Regex to match a decimal point for a double. In the PR I have open[2]  (uses option 1 and will be closed) I explicitly require quoting the decimal to allow easier parsing. Also of note, you can only pass a quoted String or attribute name as the expression value subject (the thing getting a method called on it). So “${1.2:toNumber()}” would not be valid but “${"1.2":toNumber()}” would be.
>        2. Evaluation of String which would parse to a double: When you run "${“1-2”:replace(“-“, “.”)}” you merely create a String with the contents "1.2". This is because the type only changes after being run through a method which returns a specific type.
>        3. Long Term users: I agree that long term users would need to adjust their thinking but for a different reason. They have only ever used/had access to using whole numbers. This change would require users to change their thinking to think of new use-cases. If they don't want to adjust then they can continue only using whole numbers the exact same way they always have.
>        4: User acceptance: I agree that specifying input type to get a specific output type will be something new to non-dev users and would require good documentation with good examples. That being said I think the added learning curve is worth the added functionality. 
> 
> [1] https://github.com/apache/nifi/blob/32398693d5917e35d3a6cd1549edd795e16fa8ec/nifi-commons/nifi-expression-language/src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionLexer.g#L80-L80
> [2] https://github.com/apache/nifi/pull/310
> 
> Joe
> 
> - - - - - - 
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
> 
> 
> 
> On Monday, April 25, 2016 9:31 PM, Andy LoPresto <al...@apache.org> wrote:
> 
> 
> 
> Mark made some good points in his support of option 3. I’ll try to list some of the scenarios I’m worried about, and if we can address each, I’m fine with that choice. For the rest of this message, “user” refers to the person writing the EL expression. They may or may not be a developer, but have read our existing EL documentation. 
> 
> 1. Loss of precision — currently there is an issue where division of Long values can result in a loss of precision. I have provided a failing test here [1]. I believe this would be resolved by any of the proposed solutions, but the auto-infer mechanism of option 3 would be the most concise. 
> 2. Auto-casting — I’m not sure how the “auto-interpretation” would occur. Would it only happen on the result of mathematical operations? Would “${1.2:toNumber()}” return “1” as a Long or “1.2” as a Decimal? What about “${“1-2”:replace(“-“, “.”)}”? Is this a String with contents “1.2” or a Number (either Long or double)? The current “single numeric type” language has (I imagine) conditioned users to associate any numerical value with the Number type. I believe splitting decimal values out to “Decimal” type is correct, but will require very clear documentation and probably a concerted education effort for long-time users. As Joe pointed out, if a user wanted a decimal result of division, they would need to specify a decimal dividend or divisor, rather than operate on the quotient. This is accepted practice by developers, but is unlikely to be familiar with non-developers (i.e. writing “${“10”:divide(“3”):toDecimal()}” would result in “3”; “${“10”:divide(“3”:toDecimal())}” or ${“10”:toDecimal():divide(“3”)}” are necessary to get a result of “3.33…”)
> 
> I think as long as the user experience and concise and explicit documentation is held in paramount, I am ok with option 3. I just see the most potential for user confusion there, especially after some recent mailing list questions about the expression language functions and documentation. 
> 
> [1] https://github.com/alopresto/nifi/blob/el_double/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy#L108
> 
> 
> 
> 
> 
> 
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Apr 25, 2016, at 7:22 AM, Pierre Villard <pi...@gmail.com> wrote:
>> 
>> With explanations from Mark, option 3 is fine with me especially as it is
>> backward compatible.
>> 
>> Pierre
>> 
>> 2016-04-25 16:14 GMT+02:00 Joe Percivall <jo...@yahoo.com.invalid>:
>> 
>> 
>> Pierre and Andy,
>>> 
>>> How do you guys feel about option 3 as Mark has explained it? Keeping in
>>> mind that it would be backwards compatible. As it stands we have differing
>>> opinions, which is a good thing, but we need to reach an agreement in order
>>> to proceed.
>>> 
>>> Joe
>>> - - - - - -
>>> Joseph Percivall
>>> linkedin.com/in/Percivall
>>> e: joepercivall@yahoo.com
>>> 
>>> 
>>> 
>>> 
>>> On Thursday, April 21, 2016 11:12 AM, Joe Percivall
>>> <jo...@yahoo.com.INVALID> wrote:
>>> Mark,
>>> 
>>> 
>>> After thinking through this a bit more I believe you're right. We can
>>> assume this logic for every operation:
>>> 
>>> Two whole numbers as input returns a whole number
>>> A whole number and a decimal as input returns a decimal
>>> Two decimals as input returns a Decimal
>>> 
>>> I believe, one problem with your explanation as written is "...infer the
>>> return type and allow the user to explicitly change it via toNumber() and
>>> toDecimal() calls if need be". More specifically you would need to call
>>> toNumber/toDecimal on the inputs in order to change the return type. You
>>> can't change the return type after the expression runs (without the
>>> potential loss of information/precision of converting between the two)
>>> because once the expression evaluator exits it needs to have already
>>> interpreted it as a long or double which would occur before the user is
>>> able to call toNumber/toDecimal. So the user would have to control the
>>> return type would by calling toNumber/toDecimal on the inputs.
>>> 
>>> That leads to the one confusing bit about this approach, when a user
>>> attempts to do an operation on two whole numbers that they would expect to
>>> return a decimal (like 8/5 = 1.6). In order to get their expected output
>>> the user would have to know to explicitly convert at least one of them to a
>>> double by using the toDecimal operation.
>>> 
>>> That being said it's probably that's worth it to keep down the verbosity
>>> and still have backwards compatibility.
>>> 
>>> 
>>> Joe
>>> - - - - - - Joseph Percivall
>>> linkedin.com/in/Percivall
>>> e: joepercivall@yahoo.com
>>> 
>>> 
>>> 
>>> 
>>> On Thursday, April 21, 2016 10:20 AM, Mark Payne <ma...@hotmail.com>
>>> wrote:
>>> 
>>> 
>>> 
>>> Joe,
>>> 
>>> I am definitely in favor of #3. I think if we perform some sort of binary
>>> operation on two numbers,
>>> we should return a Decimal type if either of the operands is a Decimal and
>>> a Number (Long) type
>>> if both operands are Longs. We would also need a toDecimal() method that
>>> would convert a Number
>>> type ot a Decimal type and we would need to support calling toNumber() on
>>> a decimal as well.
>>> 
>>> Given this, though, I think it makes sense to infer the return type and
>>> allow the user to explicitly
>>> change it via toNumber() and toDecimal() calls if need be. With the
>>> addition of these functions, I don't
>>> think it would be 'shady' to interpret the types at all. We already do
>>> interpret types in many cases, for
>>> instance when we have an expression such as ${num1:divide(${num2}) } -- in
>>> this case, num1 and num2
>>> are attributes, so they are strings. We already implicitly convert them to
>>> numbers. Going forward, we should
>>> convert them to either Number or Decimal type based on the regex that they
>>> match. I believe this also
>>> addresses the concern of not being able to override. I don't think
>>> backward compatibility is an issue here
>>> either, as we currently would just fail and throw an Exception if
>>> attempting to divide a string like "2.3"
>>> 
>>> Number 3 seems like a slam-dunk to me in terms of pros vs cons.
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> 
>>> 
>>> 
>>> On Apr 20, 2016, at 10:16 PM, Joe Percivall
>>>> <jo...@yahoo.com.INVALID> wrote:
>>> 
>>> 
>>>> Hello Dev list,
>>>> I've been working on a couple improvements that deal with utilizing
>>>> expression language to do data analysis and this has exposed a couple
>>> issues with the typing of numbers in Expression Language (EL). The
>>> improvements are a continuation of the topic I presented on at the Apache
>>> NiFi MeetUp group in MD[1].
>>> 
>>> The primary issue I came across is that currently in EL all numbers
>>>> interpreted as longs[2], which only store whole numbers. This leads to
>>> problems when trying to do things like dividing whole numbers or just
>>> trying to add/subtract decimals. I am actually surprised that the request
>>> for decimals this hasn't come up before. That being said, after some
>>> initial discussion with Tony and Joe, I believe that there are four
>>> potential ways forward.
>>> 
>>> 1: Create a new EL type "decimal" backed by a double[3] and new methods
>>>> to support it ("add_decimal"): This allows the user to explicitly choose
>>> whether or not they want to use decimal or whole numbers. It retains the
>>> simple use-cases that use whole numbers while opening up the new use-cases
>>> using decimals. One down side is that it is more verbose. It means adding a
>>> new function for each math operation. This is backwards compatible.
>>> 
>>> 
>>>> 2: Back all numbers by doubles instead of longs: The easy to implement
>>>> and retains very concise methods ("add", "subtract", etc..). A few cons,
>>> doubles have a lower precision than longs[4],  can lead to rounding
>>> errors[5] and could be confusing for users who only work with whole numbers
>>> to suddenly find decimals.This is not backwards compatible.
>>> 
>>> 3: Create a new EL type "decimal" back by a double[3] and attempt to
>>>> smartly automatically cast depending on method/input/output: This would
>>> allow for the positives of having decimals and whole numbers in addition to
>>> having concise methods. The main cons being a much longer implementation
>>> time to do it right and the "shadiness" of doing things automatically for
>>> the user. Also this would mean the user wouldn't have the option to
>>> explicitly override  This is not backwards compatible.
>>> 
>>> 4: Create a new EL type "decimal" backed by a double[3] and overload the
>>>> existing methods with an additional parameter to specify return type to
>>> support it: This would allow for the positives of having decimals and whole
>>> numbers in addition to having concise method names but this may cause
>>> confusion with less technical users who aren't used to specifying return
>>> types. This is backwards compatible.
>>> 
>>> 
>>>> The options that are not backwards compatible would need to wait to be
>>>> implemented until 1.0.
>>> 
>>> The current option I am leaning towards is number 1 due to the
>>>> explicitness and greater control it gives the user. While it is more
>>> verbose I think the decimal vs whole number syntax will be easy for even
>>> non-technical users to pick up. Also I currently have a PR for it up
>>> here[6].
>>> 
>>> Any other ideas or suggestions are welcome!
>>>> [1] http://www.meetup.com/ApacheNiFi/events/229158777/
>>>> [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
>>>> [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
>>>> [4]
>>>> https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
>>> 
>>> [5] http://stackoverflow.com/questions/960072/rounding-errors
>>>> [6] https://issues.apache.org/jira/browse/NIFI-1662
>>>> Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle:
>>>> joepercivall@yahoo.com
>>> 
>>> 

Re: [Discuss] Expression Language Options to Add Decimal Support

Posted by Joe Percivall <jo...@yahoo.com.INVALID>.
Thanks for going through each of your concerns Andy and I will address each:


1: I agree that option 3 would be most concise and would address this loss of precision.
2: There are a couple points here:
        1. Using decimal points: currently decimal points are reserved characters in expression language[1] but I'm not sure that they are used anywhere (I only see two matches for "DOT", in the lexer and parser). I'm not entirely sure how prioritization would work when adding the Regex to match a decimal point for a double. In the PR I have open[2]  (uses option 1 and will be closed) I explicitly require quoting the decimal to allow easier parsing. Also of note, you can only pass a quoted String or attribute name as the expression value subject (the thing getting a method called on it). So “${1.2:toNumber()}” would not be valid but “${"1.2":toNumber()}” would be.
        2. Evaluation of String which would parse to a double: When you run "${“1-2”:replace(“-“, “.”)}” you merely create a String with the contents "1.2". This is because the type only changes after being run through a method which returns a specific type.
        3. Long Term users: I agree that long term users would need to adjust their thinking but for a different reason. They have only ever used/had access to using whole numbers. This change would require users to change their thinking to think of new use-cases. If they don't want to adjust then they can continue only using whole numbers the exact same way they always have.
        4: User acceptance: I agree that specifying input type to get a specific output type will be something new to non-dev users and would require good documentation with good examples. That being said I think the added learning curve is worth the added functionality. 

[1] https://github.com/apache/nifi/blob/32398693d5917e35d3a6cd1549edd795e16fa8ec/nifi-commons/nifi-expression-language/src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionLexer.g#L80-L80
[2] https://github.com/apache/nifi/pull/310

Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com



On Monday, April 25, 2016 9:31 PM, Andy LoPresto <al...@apache.org> wrote:



Mark made some good points in his support of option 3. I’ll try to list some of the scenarios I’m worried about, and if we can address each, I’m fine with that choice. For the rest of this message, “user” refers to the person writing the EL expression. They may or may not be a developer, but have read our existing EL documentation. 

1. Loss of precision — currently there is an issue where division of Long values can result in a loss of precision. I have provided a failing test here [1]. I believe this would be resolved by any of the proposed solutions, but the auto-infer mechanism of option 3 would be the most concise. 
2. Auto-casting — I’m not sure how the “auto-interpretation” would occur. Would it only happen on the result of mathematical operations? Would “${1.2:toNumber()}” return “1” as a Long or “1.2” as a Decimal? What about “${“1-2”:replace(“-“, “.”)}”? Is this a String with contents “1.2” or a Number (either Long or double)? The current “single numeric type” language has (I imagine) conditioned users to associate any numerical value with the Number type. I believe splitting decimal values out to “Decimal” type is correct, but will require very clear documentation and probably a concerted education effort for long-time users. As Joe pointed out, if a user wanted a decimal result of division, they would need to specify a decimal dividend or divisor, rather than operate on the quotient. This is accepted practice by developers, but is unlikely to be familiar with non-developers (i.e. writing “${“10”:divide(“3”):toDecimal()}” would result in “3”; “${“10”:divide(“3”:toDecimal())}” or ${“10”:toDecimal():divide(“3”)}” are necessary to get a result of “3.33…”)

I think as long as the user experience and concise and explicit documentation is held in paramount, I am ok with option 3. I just see the most potential for user confusion there, especially after some recent mailing list questions about the expression language functions and documentation. 

[1] https://github.com/alopresto/nifi/blob/el_double/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy#L108






Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Apr 25, 2016, at 7:22 AM, Pierre Villard <pi...@gmail.com> wrote:
>
>With explanations from Mark, option 3 is fine with me especially as it is
>backward compatible.
>
>Pierre
>
>2016-04-25 16:14 GMT+02:00 Joe Percivall <jo...@yahoo.com.invalid>:
>
>
>Pierre and Andy,
>>
>>How do you guys feel about option 3 as Mark has explained it? Keeping in
>>mind that it would be backwards compatible. As it stands we have differing
>>opinions, which is a good thing, but we need to reach an agreement in order
>>to proceed.
>>
>>Joe
>>- - - - - -
>>Joseph Percivall
>>linkedin.com/in/Percivall
>>e: joepercivall@yahoo.com
>>
>>
>>
>>
>>On Thursday, April 21, 2016 11:12 AM, Joe Percivall
>><jo...@yahoo.com.INVALID> wrote:
>>Mark,
>>
>>
>>After thinking through this a bit more I believe you're right. We can
>>assume this logic for every operation:
>>
>>Two whole numbers as input returns a whole number
>>A whole number and a decimal as input returns a decimal
>>Two decimals as input returns a Decimal
>>
>>I believe, one problem with your explanation as written is "...infer the
>>return type and allow the user to explicitly change it via toNumber() and
>>toDecimal() calls if need be". More specifically you would need to call
>>toNumber/toDecimal on the inputs in order to change the return type. You
>>can't change the return type after the expression runs (without the
>>potential loss of information/precision of converting between the two)
>>because once the expression evaluator exits it needs to have already
>>interpreted it as a long or double which would occur before the user is
>>able to call toNumber/toDecimal. So the user would have to control the
>>return type would by calling toNumber/toDecimal on the inputs.
>>
>>That leads to the one confusing bit about this approach, when a user
>>attempts to do an operation on two whole numbers that they would expect to
>>return a decimal (like 8/5 = 1.6). In order to get their expected output
>>the user would have to know to explicitly convert at least one of them to a
>>double by using the toDecimal operation.
>>
>>That being said it's probably that's worth it to keep down the verbosity
>>and still have backwards compatibility.
>>
>>
>>Joe
>>- - - - - - Joseph Percivall
>>linkedin.com/in/Percivall
>>e: joepercivall@yahoo.com
>>
>>
>>
>>
>>On Thursday, April 21, 2016 10:20 AM, Mark Payne <ma...@hotmail.com>
>>wrote:
>>
>>
>>
>>Joe,
>>
>>I am definitely in favor of #3. I think if we perform some sort of binary
>>operation on two numbers,
>>we should return a Decimal type if either of the operands is a Decimal and
>>a Number (Long) type
>>if both operands are Longs. We would also need a toDecimal() method that
>>would convert a Number
>>type ot a Decimal type and we would need to support calling toNumber() on
>>a decimal as well.
>>
>>Given this, though, I think it makes sense to infer the return type and
>>allow the user to explicitly
>>change it via toNumber() and toDecimal() calls if need be. With the
>>addition of these functions, I don't
>>think it would be 'shady' to interpret the types at all. We already do
>>interpret types in many cases, for
>>instance when we have an expression such as ${num1:divide(${num2}) } -- in
>>this case, num1 and num2
>>are attributes, so they are strings. We already implicitly convert them to
>>numbers. Going forward, we should
>>convert them to either Number or Decimal type based on the regex that they
>>match. I believe this also
>>addresses the concern of not being able to override. I don't think
>>backward compatibility is an issue here
>>either, as we currently would just fail and throw an Exception if
>>attempting to divide a string like "2.3"
>>
>>Number 3 seems like a slam-dunk to me in terms of pros vs cons.
>>
>>Thanks
>>-Mark
>>
>>
>>
>>
>>On Apr 20, 2016, at 10:16 PM, Joe Percivall
>>><jo...@yahoo.com.INVALID> wrote:
>>
>>
>>>Hello Dev list,
>>>I've been working on a couple improvements that deal with utilizing
>>>expression language to do data analysis and this has exposed a couple
>>issues with the typing of numbers in Expression Language (EL). The
>>improvements are a continuation of the topic I presented on at the Apache
>>NiFi MeetUp group in MD[1].
>>
>>The primary issue I came across is that currently in EL all numbers
>>>interpreted as longs[2], which only store whole numbers. This leads to
>>problems when trying to do things like dividing whole numbers or just
>>trying to add/subtract decimals. I am actually surprised that the request
>>for decimals this hasn't come up before. That being said, after some
>>initial discussion with Tony and Joe, I believe that there are four
>>potential ways forward.
>>
>>1: Create a new EL type "decimal" backed by a double[3] and new methods
>>>to support it ("add_decimal"): This allows the user to explicitly choose
>>whether or not they want to use decimal or whole numbers. It retains the
>>simple use-cases that use whole numbers while opening up the new use-cases
>>using decimals. One down side is that it is more verbose. It means adding a
>>new function for each math operation. This is backwards compatible.
>>
>>
>>>2: Back all numbers by doubles instead of longs: The easy to implement
>>>and retains very concise methods ("add", "subtract", etc..). A few cons,
>>doubles have a lower precision than longs[4],  can lead to rounding
>>errors[5] and could be confusing for users who only work with whole numbers
>>to suddenly find decimals.This is not backwards compatible.
>>
>>3: Create a new EL type "decimal" back by a double[3] and attempt to
>>>smartly automatically cast depending on method/input/output: This would
>>allow for the positives of having decimals and whole numbers in addition to
>>having concise methods. The main cons being a much longer implementation
>>time to do it right and the "shadiness" of doing things automatically for
>>the user. Also this would mean the user wouldn't have the option to
>>explicitly override  This is not backwards compatible.
>>
>>4: Create a new EL type "decimal" backed by a double[3] and overload the
>>>existing methods with an additional parameter to specify return type to
>>support it: This would allow for the positives of having decimals and whole
>>numbers in addition to having concise method names but this may cause
>>confusion with less technical users who aren't used to specifying return
>>types. This is backwards compatible.
>>
>>
>>>The options that are not backwards compatible would need to wait to be
>>>implemented until 1.0.
>>
>>The current option I am leaning towards is number 1 due to the
>>>explicitness and greater control it gives the user. While it is more
>>verbose I think the decimal vs whole number syntax will be easy for even
>>non-technical users to pick up. Also I currently have a PR for it up
>>here[6].
>>
>>Any other ideas or suggestions are welcome!
>>>[1] http://www.meetup.com/ApacheNiFi/events/229158777/
>>>[2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
>>>[3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
>>>[4]
>>>https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
>>
>>[5] http://stackoverflow.com/questions/960072/rounding-errors
>>>[6] https://issues.apache.org/jira/browse/NIFI-1662
>>>Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle:
>>>joepercivall@yahoo.com
>>
>>

Re: [Discuss] Expression Language Options to Add Decimal Support

Posted by Andy LoPresto <al...@apache.org>.
Mark made some good points in his support of option 3. I’ll try to list some of the scenarios I’m worried about, and if we can address each, I’m fine with that choice. For the rest of this message, “user” refers to the person writing the EL expression. They may or may not be a developer, but have read our existing EL documentation.

1. Loss of precision — currently there is an issue where division of Long values can result in a loss of precision. I have provided a failing test here [1]. I believe this would be resolved by any of the proposed solutions, but the auto-infer mechanism of option 3 would be the most concise.
2. Auto-casting — I’m not sure how the “auto-interpretation” would occur. Would it only happen on the result of mathematical operations? Would “${1.2:toNumber()}” return “1” as a Long or “1.2” as a Decimal? What about “${“1-2”:replace(“-“, “.”)}”? Is this a String with contents “1.2” or a Number (either Long or double)? The current “single numeric type” language has (I imagine) conditioned users to associate any numerical value with the Number type. I believe splitting decimal values out to “Decimal” type is correct, but will require very clear documentation and probably a concerted education effort for long-time users. As Joe pointed out, if a user wanted a decimal result of division, they would need to specify a decimal dividend or divisor, rather than operate on the quotient. This is accepted practice by developers, but is unlikely to be familiar with non-developers (i.e. writing “${“10”:divide(“3”):toDecimal()}” would result in “3”; “${“10”:divide(“3”:toDecimal())}” or ${“10”:toDecimal():divide(“3”)}” are necessary to get a result of “3.33…”)

I think as long as the user experience and concise and explicit documentation is held in paramount, I am ok with option 3. I just see the most potential for user confusion there, especially after some recent mailing list questions about the expression language functions and documentation.

[1] https://github.com/alopresto/nifi/blob/el_double/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy#L108 <https://github.com/alopresto/nifi/blob/el_double/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy#L108>




Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Apr 25, 2016, at 7:22 AM, Pierre Villard <pi...@gmail.com> wrote:
> 
> With explanations from Mark, option 3 is fine with me especially as it is
> backward compatible.
> 
> Pierre
> 
> 2016-04-25 16:14 GMT+02:00 Joe Percivall <jo...@yahoo.com.invalid>:
> 
>> Pierre and Andy,
>> 
>> How do you guys feel about option 3 as Mark has explained it? Keeping in
>> mind that it would be backwards compatible. As it stands we have differing
>> opinions, which is a good thing, but we need to reach an agreement in order
>> to proceed.
>> 
>> Joe
>> - - - - - -
>> Joseph Percivall
>> linkedin.com/in/Percivall
>> e: joepercivall@yahoo.com
>> 
>> 
>> 
>> 
>> On Thursday, April 21, 2016 11:12 AM, Joe Percivall
>> <jo...@yahoo.com.INVALID> wrote:
>> Mark,
>> 
>> 
>> After thinking through this a bit more I believe you're right. We can
>> assume this logic for every operation:
>> 
>> Two whole numbers as input returns a whole number
>> A whole number and a decimal as input returns a decimal
>> Two decimals as input returns a Decimal
>> 
>> I believe, one problem with your explanation as written is "...infer the
>> return type and allow the user to explicitly change it via toNumber() and
>> toDecimal() calls if need be". More specifically you would need to call
>> toNumber/toDecimal on the inputs in order to change the return type. You
>> can't change the return type after the expression runs (without the
>> potential loss of information/precision of converting between the two)
>> because once the expression evaluator exits it needs to have already
>> interpreted it as a long or double which would occur before the user is
>> able to call toNumber/toDecimal. So the user would have to control the
>> return type would by calling toNumber/toDecimal on the inputs.
>> 
>> That leads to the one confusing bit about this approach, when a user
>> attempts to do an operation on two whole numbers that they would expect to
>> return a decimal (like 8/5 = 1.6). In order to get their expected output
>> the user would have to know to explicitly convert at least one of them to a
>> double by using the toDecimal operation.
>> 
>> That being said it's probably that's worth it to keep down the verbosity
>> and still have backwards compatibility.
>> 
>> 
>> Joe
>> - - - - - - Joseph Percivall
>> linkedin.com/in/Percivall
>> e: joepercivall@yahoo.com
>> 
>> 
>> 
>> 
>> On Thursday, April 21, 2016 10:20 AM, Mark Payne <ma...@hotmail.com>
>> wrote:
>> 
>> 
>> 
>> Joe,
>> 
>> I am definitely in favor of #3. I think if we perform some sort of binary
>> operation on two numbers,
>> we should return a Decimal type if either of the operands is a Decimal and
>> a Number (Long) type
>> if both operands are Longs. We would also need a toDecimal() method that
>> would convert a Number
>> type ot a Decimal type and we would need to support calling toNumber() on
>> a decimal as well.
>> 
>> Given this, though, I think it makes sense to infer the return type and
>> allow the user to explicitly
>> change it via toNumber() and toDecimal() calls if need be. With the
>> addition of these functions, I don't
>> think it would be 'shady' to interpret the types at all. We already do
>> interpret types in many cases, for
>> instance when we have an expression such as ${num1:divide(${num2}) } -- in
>> this case, num1 and num2
>> are attributes, so they are strings. We already implicitly convert them to
>> numbers. Going forward, we should
>> convert them to either Number or Decimal type based on the regex that they
>> match. I believe this also
>> addresses the concern of not being able to override. I don't think
>> backward compatibility is an issue here
>> either, as we currently would just fail and throw an Exception if
>> attempting to divide a string like "2.3"
>> 
>> Number 3 seems like a slam-dunk to me in terms of pros vs cons.
>> 
>> Thanks
>> -Mark
>> 
>> 
>> 
>>> On Apr 20, 2016, at 10:16 PM, Joe Percivall
>> <jo...@yahoo.com.INVALID> wrote:
>>> 
>>> Hello Dev list,
>>> I've been working on a couple improvements that deal with utilizing
>> expression language to do data analysis and this has exposed a couple
>> issues with the typing of numbers in Expression Language (EL). The
>> improvements are a continuation of the topic I presented on at the Apache
>> NiFi MeetUp group in MD[1].
>>> The primary issue I came across is that currently in EL all numbers
>> interpreted as longs[2], which only store whole numbers. This leads to
>> problems when trying to do things like dividing whole numbers or just
>> trying to add/subtract decimals. I am actually surprised that the request
>> for decimals this hasn't come up before. That being said, after some
>> initial discussion with Tony and Joe, I believe that there are four
>> potential ways forward.
>>> 1: Create a new EL type "decimal" backed by a double[3] and new methods
>> to support it ("add_decimal"): This allows the user to explicitly choose
>> whether or not they want to use decimal or whole numbers. It retains the
>> simple use-cases that use whole numbers while opening up the new use-cases
>> using decimals. One down side is that it is more verbose. It means adding a
>> new function for each math operation. This is backwards compatible.
>>> 
>>> 2: Back all numbers by doubles instead of longs: The easy to implement
>> and retains very concise methods ("add", "subtract", etc..). A few cons,
>> doubles have a lower precision than longs[4],  can lead to rounding
>> errors[5] and could be confusing for users who only work with whole numbers
>> to suddenly find decimals.This is not backwards compatible.
>>> 3: Create a new EL type "decimal" back by a double[3] and attempt to
>> smartly automatically cast depending on method/input/output: This would
>> allow for the positives of having decimals and whole numbers in addition to
>> having concise methods. The main cons being a much longer implementation
>> time to do it right and the "shadiness" of doing things automatically for
>> the user. Also this would mean the user wouldn't have the option to
>> explicitly override  This is not backwards compatible.
>>> 4: Create a new EL type "decimal" backed by a double[3] and overload the
>> existing methods with an additional parameter to specify return type to
>> support it: This would allow for the positives of having decimals and whole
>> numbers in addition to having concise method names but this may cause
>> confusion with less technical users who aren't used to specifying return
>> types. This is backwards compatible.
>>> 
>>> The options that are not backwards compatible would need to wait to be
>> implemented until 1.0.
>>> The current option I am leaning towards is number 1 due to the
>> explicitness and greater control it gives the user. While it is more
>> verbose I think the decimal vs whole number syntax will be easy for even
>> non-technical users to pick up. Also I currently have a PR for it up
>> here[6].
>>> Any other ideas or suggestions are welcome!
>>> [1] http://www.meetup.com/ApacheNiFi/events/229158777/
>>> [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
>>> [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
>>> [4]
>> https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
>>> [5] http://stackoverflow.com/questions/960072/rounding-errors
>>> [6] https://issues.apache.org/jira/browse/NIFI-1662
>>> Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle:
>> joepercivall@yahoo.com
>> 


Re: [Discuss] Expression Language Options to Add Decimal Support

Posted by Pierre Villard <pi...@gmail.com>.
With explanations from Mark, option 3 is fine with me especially as it is
backward compatible.

Pierre

2016-04-25 16:14 GMT+02:00 Joe Percivall <jo...@yahoo.com.invalid>:

> Pierre and Andy,
>
> How do you guys feel about option 3 as Mark has explained it? Keeping in
> mind that it would be backwards compatible. As it stands we have differing
> opinions, which is a good thing, but we need to reach an agreement in order
> to proceed.
>
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
>
> On Thursday, April 21, 2016 11:12 AM, Joe Percivall
> <jo...@yahoo.com.INVALID> wrote:
> Mark,
>
>
> After thinking through this a bit more I believe you're right. We can
> assume this logic for every operation:
>
> Two whole numbers as input returns a whole number
> A whole number and a decimal as input returns a decimal
> Two decimals as input returns a Decimal
>
> I believe, one problem with your explanation as written is "...infer the
> return type and allow the user to explicitly change it via toNumber() and
> toDecimal() calls if need be". More specifically you would need to call
> toNumber/toDecimal on the inputs in order to change the return type. You
> can't change the return type after the expression runs (without the
> potential loss of information/precision of converting between the two)
> because once the expression evaluator exits it needs to have already
> interpreted it as a long or double which would occur before the user is
> able to call toNumber/toDecimal. So the user would have to control the
> return type would by calling toNumber/toDecimal on the inputs.
>
> That leads to the one confusing bit about this approach, when a user
> attempts to do an operation on two whole numbers that they would expect to
> return a decimal (like 8/5 = 1.6). In order to get their expected output
> the user would have to know to explicitly convert at least one of them to a
> double by using the toDecimal operation.
>
> That being said it's probably that's worth it to keep down the verbosity
> and still have backwards compatibility.
>
>
> Joe
> - - - - - - Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
>
> On Thursday, April 21, 2016 10:20 AM, Mark Payne <ma...@hotmail.com>
> wrote:
>
>
>
> Joe,
>
> I am definitely in favor of #3. I think if we perform some sort of binary
> operation on two numbers,
> we should return a Decimal type if either of the operands is a Decimal and
> a Number (Long) type
> if both operands are Longs. We would also need a toDecimal() method that
> would convert a Number
> type ot a Decimal type and we would need to support calling toNumber() on
> a decimal as well.
>
> Given this, though, I think it makes sense to infer the return type and
> allow the user to explicitly
> change it via toNumber() and toDecimal() calls if need be. With the
> addition of these functions, I don't
> think it would be 'shady' to interpret the types at all. We already do
> interpret types in many cases, for
> instance when we have an expression such as ${num1:divide(${num2}) } -- in
> this case, num1 and num2
> are attributes, so they are strings. We already implicitly convert them to
> numbers. Going forward, we should
> convert them to either Number or Decimal type based on the regex that they
> match. I believe this also
> addresses the concern of not being able to override. I don't think
> backward compatibility is an issue here
> either, as we currently would just fail and throw an Exception if
> attempting to divide a string like "2.3"
>
> Number 3 seems like a slam-dunk to me in terms of pros vs cons.
>
> Thanks
> -Mark
>
>
>
> > On Apr 20, 2016, at 10:16 PM, Joe Percivall
> <jo...@yahoo.com.INVALID> wrote:
> >
> > Hello Dev list,
> > I've been working on a couple improvements that deal with utilizing
> expression language to do data analysis and this has exposed a couple
> issues with the typing of numbers in Expression Language (EL). The
> improvements are a continuation of the topic I presented on at the Apache
> NiFi MeetUp group in MD[1].
> > The primary issue I came across is that currently in EL all numbers
> interpreted as longs[2], which only store whole numbers. This leads to
> problems when trying to do things like dividing whole numbers or just
> trying to add/subtract decimals. I am actually surprised that the request
> for decimals this hasn't come up before. That being said, after some
> initial discussion with Tony and Joe, I believe that there are four
> potential ways forward.
> > 1: Create a new EL type "decimal" backed by a double[3] and new methods
> to support it ("add_decimal"): This allows the user to explicitly choose
> whether or not they want to use decimal or whole numbers. It retains the
> simple use-cases that use whole numbers while opening up the new use-cases
> using decimals. One down side is that it is more verbose. It means adding a
> new function for each math operation. This is backwards compatible.
> >
> > 2: Back all numbers by doubles instead of longs: The easy to implement
> and retains very concise methods ("add", "subtract", etc..). A few cons,
> doubles have a lower precision than longs[4],  can lead to rounding
> errors[5] and could be confusing for users who only work with whole numbers
> to suddenly find decimals.This is not backwards compatible.
> > 3: Create a new EL type "decimal" back by a double[3] and attempt to
> smartly automatically cast depending on method/input/output: This would
> allow for the positives of having decimals and whole numbers in addition to
> having concise methods. The main cons being a much longer implementation
> time to do it right and the "shadiness" of doing things automatically for
> the user. Also this would mean the user wouldn't have the option to
> explicitly override  This is not backwards compatible.
> > 4: Create a new EL type "decimal" backed by a double[3] and overload the
> existing methods with an additional parameter to specify return type to
> support it: This would allow for the positives of having decimals and whole
> numbers in addition to having concise method names but this may cause
> confusion with less technical users who aren't used to specifying return
> types. This is backwards compatible.
> >
> > The options that are not backwards compatible would need to wait to be
> implemented until 1.0.
> > The current option I am leaning towards is number 1 due to the
> explicitness and greater control it gives the user. While it is more
> verbose I think the decimal vs whole number syntax will be easy for even
> non-technical users to pick up. Also I currently have a PR for it up
> here[6].
> > Any other ideas or suggestions are welcome!
> > [1] http://www.meetup.com/ApacheNiFi/events/229158777/
> > [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
> > [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
> > [4]
> https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
> > [5] http://stackoverflow.com/questions/960072/rounding-errors
> > [6] https://issues.apache.org/jira/browse/NIFI-1662
> > Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle:
> joepercivall@yahoo.com
>

Re: [Discuss] Expression Language Options to Add Decimal Support

Posted by Joe Percivall <jo...@yahoo.com.INVALID>.
Pierre and Andy,

How do you guys feel about option 3 as Mark has explained it? Keeping in mind that it would be backwards compatible. As it stands we have differing opinions, which is a good thing, but we need to reach an agreement in order to proceed.
 
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com




On Thursday, April 21, 2016 11:12 AM, Joe Percivall <jo...@yahoo.com.INVALID> wrote:
Mark,


After thinking through this a bit more I believe you're right. We can assume this logic for every operation:

Two whole numbers as input returns a whole number
A whole number and a decimal as input returns a decimal
Two decimals as input returns a Decimal

I believe, one problem with your explanation as written is "...infer the return type and allow the user to explicitly change it via toNumber() and toDecimal() calls if need be". More specifically you would need to call toNumber/toDecimal on the inputs in order to change the return type. You can't change the return type after the expression runs (without the potential loss of information/precision of converting between the two) because once the expression evaluator exits it needs to have already interpreted it as a long or double which would occur before the user is able to call toNumber/toDecimal. So the user would have to control the return type would by calling toNumber/toDecimal on the inputs. 

That leads to the one confusing bit about this approach, when a user attempts to do an operation on two whole numbers that they would expect to return a decimal (like 8/5 = 1.6). In order to get their expected output the user would have to know to explicitly convert at least one of them to a double by using the toDecimal operation. 

That being said it's probably that's worth it to keep down the verbosity and still have backwards compatibility. 


Joe
- - - - - - Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com




On Thursday, April 21, 2016 10:20 AM, Mark Payne <ma...@hotmail.com> wrote:



Joe,

I am definitely in favor of #3. I think if we perform some sort of binary operation on two numbers,
we should return a Decimal type if either of the operands is a Decimal and a Number (Long) type
if both operands are Longs. We would also need a toDecimal() method that would convert a Number
type ot a Decimal type and we would need to support calling toNumber() on a decimal as well.

Given this, though, I think it makes sense to infer the return type and allow the user to explicitly
change it via toNumber() and toDecimal() calls if need be. With the addition of these functions, I don't
think it would be 'shady' to interpret the types at all. We already do interpret types in many cases, for
instance when we have an expression such as ${num1:divide(${num2}) } -- in this case, num1 and num2
are attributes, so they are strings. We already implicitly convert them to numbers. Going forward, we should
convert them to either Number or Decimal type based on the regex that they match. I believe this also
addresses the concern of not being able to override. I don't think backward compatibility is an issue here
either, as we currently would just fail and throw an Exception if attempting to divide a string like "2.3"

Number 3 seems like a slam-dunk to me in terms of pros vs cons.

Thanks
-Mark



> On Apr 20, 2016, at 10:16 PM, Joe Percivall <jo...@yahoo.com.INVALID> wrote:
> 
> Hello Dev list,
> I've been working on a couple improvements that deal with utilizing expression language to do data analysis and this has exposed a couple issues with the typing of numbers in Expression Language (EL). The improvements are a continuation of the topic I presented on at the Apache NiFi MeetUp group in MD[1]. 
> The primary issue I came across is that currently in EL all numbers interpreted as longs[2], which only store whole numbers. This leads to problems when trying to do things like dividing whole numbers or just trying to add/subtract decimals. I am actually surprised that the request for decimals this hasn't come up before. That being said, after some initial discussion with Tony and Joe, I believe that there are four potential ways forward.
> 1: Create a new EL type "decimal" backed by a double[3] and new methods to support it ("add_decimal"): This allows the user to explicitly choose whether or not they want to use decimal or whole numbers. It retains the simple use-cases that use whole numbers while opening up the new use-cases using decimals. One down side is that it is more verbose. It means adding a new function for each math operation. This is backwards compatible.
> 
> 2: Back all numbers by doubles instead of longs: The easy to implement and retains very concise methods ("add", "subtract", etc..). A few cons, doubles have a lower precision than longs[4],  can lead to rounding errors[5] and could be confusing for users who only work with whole numbers to suddenly find decimals.This is not backwards compatible.
> 3: Create a new EL type "decimal" back by a double[3] and attempt to smartly automatically cast depending on method/input/output: This would allow for the positives of having decimals and whole numbers in addition to having concise methods. The main cons being a much longer implementation time to do it right and the "shadiness" of doing things automatically for the user. Also this would mean the user wouldn't have the option to explicitly override  This is not backwards compatible.
> 4: Create a new EL type "decimal" backed by a double[3] and overload the existing methods with an additional parameter to specify return type to support it: This would allow for the positives of having decimals and whole numbers in addition to having concise method names but this may cause confusion with less technical users who aren't used to specifying return types. This is backwards compatible.
> 
> The options that are not backwards compatible would need to wait to be implemented until 1.0.
> The current option I am leaning towards is number 1 due to the explicitness and greater control it gives the user. While it is more verbose I think the decimal vs whole number syntax will be easy for even non-technical users to pick up. Also I currently have a PR for it up here[6].
> Any other ideas or suggestions are welcome!
> [1] http://www.meetup.com/ApacheNiFi/events/229158777/
> [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
> [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
> [4] https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
> [5] http://stackoverflow.com/questions/960072/rounding-errors
> [6] https://issues.apache.org/jira/browse/NIFI-1662
> Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joepercivall@yahoo.com

Re: [Discuss] Expression Language Options to Add Decimal Support

Posted by Joe Percivall <jo...@yahoo.com.INVALID>.
Mark,


After thinking through this a bit more I believe you're right. We can assume this logic for every operation:

Two whole numbers as input returns a whole number
A whole number and a decimal as input returns a decimal
Two decimals as input returns a Decimal

I believe, one problem with your explanation as written is "...infer the return type and allow the user to explicitly change it via toNumber() and toDecimal() calls if need be". More specifically you would need to call toNumber/toDecimal on the inputs in order to change the return type. You can't change the return type after the expression runs (without the potential loss of information/precision of converting between the two) because once the expression evaluator exits it needs to have already interpreted it as a long or double which would occur before the user is able to call toNumber/toDecimal. So the user would have to control the return type would by calling toNumber/toDecimal on the inputs. 

That leads to the one confusing bit about this approach, when a user attempts to do an operation on two whole numbers that they would expect to return a decimal (like 8/5 = 1.6). In order to get their expected output the user would have to know to explicitly convert at least one of them to a double by using the toDecimal operation. 

That being said it's probably that's worth it to keep down the verbosity and still have backwards compatibility. 
 

Joe
- - - - - - Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com



On Thursday, April 21, 2016 10:20 AM, Mark Payne <ma...@hotmail.com> wrote:



Joe,

I am definitely in favor of #3. I think if we perform some sort of binary operation on two numbers,
we should return a Decimal type if either of the operands is a Decimal and a Number (Long) type
if both operands are Longs. We would also need a toDecimal() method that would convert a Number
type ot a Decimal type and we would need to support calling toNumber() on a decimal as well.

Given this, though, I think it makes sense to infer the return type and allow the user to explicitly
change it via toNumber() and toDecimal() calls if need be. With the addition of these functions, I don't
think it would be 'shady' to interpret the types at all. We already do interpret types in many cases, for
instance when we have an expression such as ${num1:divide(${num2}) } -- in this case, num1 and num2
are attributes, so they are strings. We already implicitly convert them to numbers. Going forward, we should
convert them to either Number or Decimal type based on the regex that they match. I believe this also
addresses the concern of not being able to override. I don't think backward compatibility is an issue here
either, as we currently would just fail and throw an Exception if attempting to divide a string like "2.3"

Number 3 seems like a slam-dunk to me in terms of pros vs cons.

Thanks
-Mark



> On Apr 20, 2016, at 10:16 PM, Joe Percivall <jo...@yahoo.com.INVALID> wrote:
> 
> Hello Dev list,
> I've been working on a couple improvements that deal with utilizing expression language to do data analysis and this has exposed a couple issues with the typing of numbers in Expression Language (EL). The improvements are a continuation of the topic I presented on at the Apache NiFi MeetUp group in MD[1]. 
> The primary issue I came across is that currently in EL all numbers interpreted as longs[2], which only store whole numbers. This leads to problems when trying to do things like dividing whole numbers or just trying to add/subtract decimals. I am actually surprised that the request for decimals this hasn't come up before. That being said, after some initial discussion with Tony and Joe, I believe that there are four potential ways forward.
> 1: Create a new EL type "decimal" backed by a double[3] and new methods to support it ("add_decimal"): This allows the user to explicitly choose whether or not they want to use decimal or whole numbers. It retains the simple use-cases that use whole numbers while opening up the new use-cases using decimals. One down side is that it is more verbose. It means adding a new function for each math operation. This is backwards compatible.
> 
> 2: Back all numbers by doubles instead of longs: The easy to implement and retains very concise methods ("add", "subtract", etc..). A few cons, doubles have a lower precision than longs[4],  can lead to rounding errors[5] and could be confusing for users who only work with whole numbers to suddenly find decimals.This is not backwards compatible.
> 3: Create a new EL type "decimal" back by a double[3] and attempt to smartly automatically cast depending on method/input/output: This would allow for the positives of having decimals and whole numbers in addition to having concise methods. The main cons being a much longer implementation time to do it right and the "shadiness" of doing things automatically for the user. Also this would mean the user wouldn't have the option to explicitly override  This is not backwards compatible.
> 4: Create a new EL type "decimal" backed by a double[3] and overload the existing methods with an additional parameter to specify return type to support it: This would allow for the positives of having decimals and whole numbers in addition to having concise method names but this may cause confusion with less technical users who aren't used to specifying return types. This is backwards compatible.
> 
> The options that are not backwards compatible would need to wait to be implemented until 1.0.
> The current option I am leaning towards is number 1 due to the explicitness and greater control it gives the user. While it is more verbose I think the decimal vs whole number syntax will be easy for even non-technical users to pick up. Also I currently have a PR for it up here[6].
> Any other ideas or suggestions are welcome!
> [1] http://www.meetup.com/ApacheNiFi/events/229158777/
> [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
> [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
> [4] https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
> [5] http://stackoverflow.com/questions/960072/rounding-errors
> [6] https://issues.apache.org/jira/browse/NIFI-1662
> Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joepercivall@yahoo.com

Re: [Discuss] Expression Language Options to Add Decimal Support

Posted by Mark Payne <ma...@hotmail.com>.
Joe,

I am definitely in favor of #3. I think if we perform some sort of binary operation on two numbers,
we should return a Decimal type if either of the operands is a Decimal and a Number (Long) type
if both operands are Longs. We would also need a toDecimal() method that would convert a Number
type ot a Decimal type and we would need to support calling toNumber() on a decimal as well.

Given this, though, I think it makes sense to infer the return type and allow the user to explicitly
change it via toNumber() and toDecimal() calls if need be. With the addition of these functions, I don't
think it would be 'shady' to interpret the types at all. We already do interpret types in many cases, for
instance when we have an expression such as ${num1:divide(${num2}) } -- in this case, num1 and num2
are attributes, so they are strings. We already implicitly convert them to numbers. Going forward, we should
convert them to either Number or Decimal type based on the regex that they match. I believe this also
addresses the concern of not being able to override. I don't think backward compatibility is an issue here
either, as we currently would just fail and throw an Exception if attempting to divide a string like "2.3"

Number 3 seems like a slam-dunk to me in terms of pros vs cons.

Thanks
-Mark


> On Apr 20, 2016, at 10:16 PM, Joe Percivall <jo...@yahoo.com.INVALID> wrote:
> 
> Hello Dev list,
> I've been working on a couple improvements that deal with utilizing expression language to do data analysis and this has exposed a couple issues with the typing of numbers in Expression Language (EL). The improvements are a continuation of the topic I presented on at the Apache NiFi MeetUp group in MD[1]. 
> The primary issue I came across is that currently in EL all numbers interpreted as longs[2], which only store whole numbers. This leads to problems when trying to do things like dividing whole numbers or just trying to add/subtract decimals. I am actually surprised that the request for decimals this hasn't come up before. That being said, after some initial discussion with Tony and Joe, I believe that there are four potential ways forward.
> 1: Create a new EL type "decimal" backed by a double[3] and new methods to support it ("add_decimal"): This allows the user to explicitly choose whether or not they want to use decimal or whole numbers. It retains the simple use-cases that use whole numbers while opening up the new use-cases using decimals. One down side is that it is more verbose. It means adding a new function for each math operation. This is backwards compatible.
> 
> 2: Back all numbers by doubles instead of longs: The easy to implement and retains very concise methods ("add", "subtract", etc..). A few cons, doubles have a lower precision than longs[4],  can lead to rounding errors[5] and could be confusing for users who only work with whole numbers to suddenly find decimals.This is not backwards compatible.
> 3: Create a new EL type "decimal" back by a double[3] and attempt to smartly automatically cast depending on method/input/output: This would allow for the positives of having decimals and whole numbers in addition to having concise methods. The main cons being a much longer implementation time to do it right and the "shadiness" of doing things automatically for the user. Also this would mean the user wouldn't have the option to explicitly override  This is not backwards compatible.
> 4: Create a new EL type "decimal" backed by a double[3] and overload the existing methods with an additional parameter to specify return type to support it: This would allow for the positives of having decimals and whole numbers in addition to having concise method names but this may cause confusion with less technical users who aren't used to specifying return types. This is backwards compatible.
> 
> The options that are not backwards compatible would need to wait to be implemented until 1.0.
> The current option I am leaning towards is number 1 due to the explicitness and greater control it gives the user. While it is more verbose I think the decimal vs whole number syntax will be easy for even non-technical users to pick up. Also I currently have a PR for it up here[6].
> Any other ideas or suggestions are welcome!
> [1] http://www.meetup.com/ApacheNiFi/events/229158777/
> [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
> [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
> [4] https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
> [5] http://stackoverflow.com/questions/960072/rounding-errors
> [6] https://issues.apache.org/jira/browse/NIFI-1662
> Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joepercivall@yahoo.com


Re: [Discuss] Expression Language Options to Add Decimal Support

Posted by Pierre Villard <pi...@gmail.com>.
Hi Joe,

In my opinion, I think I would feel more "natural" to go with option 4.
Option 1 is fine as well but I think that adding a lot of functions could
be confusing with the auto-completion feature.


2016-04-21 4:47 GMT+02:00 Andy LoPresto <al...@apache.org>:

> I would support option 4, then option 1. I think this is a significant
> change, so backward-compatibility is important, and I don’t see specifying
> the return type to be a high hurdle provided the documentation is clear. If
> someone is writing in expression language, there is a minimum expectation
> of exposure to the documentation, as it is a bespoke implementation, not an
> OTS offering that they would have previous experience with.
>
> I think options 2 and 3 introduce the opportunity to compromise
> previously-functioning expressions in a way that could be very difficult to
> troubleshoot.
>
> I do appreciate you discovering and diagnosing this. I can imagine there
> are already scenarios where EL is failing silently and this should
> definitely be fixed correctly.
>
>
> Andy LoPresto
> alopresto@apache.org
> *alopresto.apache@gmail.com <al...@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Apr 20, 2016, at 7:16 PM, Joe Percivall <joepercivall@yahoo.com.INVALID
> <jo...@yahoo.com.invalid>> wrote:
>
> Hello Dev list,
> I've been working on a couple improvements that deal with utilizing
> expression language to do data analysis and this has exposed a couple
> issues with the typing of numbers in Expression Language (EL). The
> improvements are a continuation of the topic I presented on at the Apache
> NiFi MeetUp group in MD[1].
> The primary issue I came across is that currently in EL all numbers
> interpreted as longs[2], which only store whole numbers. This leads to
> problems when trying to do things like dividing whole numbers or just
> trying to add/subtract decimals. I am actually surprised that the request
> for decimals this hasn't come up before. That being said, after some
> initial discussion with Tony and Joe, I believe that there are four
> potential ways forward.
> 1: Create a new EL type "decimal" backed by a double[3] and new methods to
> support it ("add_decimal"): This allows the user to explicitly choose
> whether or not they want to use decimal or whole numbers. It retains the
> simple use-cases that use whole numbers while opening up the new use-cases
> using decimals. One down side is that it is more verbose. It means adding a
> new function for each math operation. This is backwards compatible.
>
> 2: Back all numbers by doubles instead of longs: The easy to implement and
> retains very concise methods ("add", "subtract", etc..). A few cons,
> doubles have a lower precision than longs[4],  can lead to rounding
> errors[5] and could be confusing for users who only work with whole numbers
> to suddenly find decimals.This is not backwards compatible.
> 3: Create a new EL type "decimal" back by a double[3] and attempt to
> smartly automatically cast depending on method/input/output: This would
> allow for the positives of having decimals and whole numbers in addition to
> having concise methods. The main cons being a much longer implementation
> time to do it right and the "shadiness" of doing things automatically for
> the user. Also this would mean the user wouldn't have the option to
> explicitly override  This is not backwards compatible.
> 4: Create a new EL type "decimal" backed by a double[3] and overload the
> existing methods with an additional parameter to specify return type to
> support it: This would allow for the positives of having decimals and whole
> numbers in addition to having concise method names but this may cause
> confusion with less technical users who aren't used to specifying return
> types. This is backwards compatible.
>
> The options that are not backwards compatible would need to wait to be
> implemented until 1.0.
> The current option I am leaning towards is number 1 due to the
> explicitness and greater control it gives the user. While it is more
> verbose I think the decimal vs whole number syntax will be easy for even
> non-technical users to pick up. Also I currently have a PR for it up
> here[6].
> Any other ideas or suggestions are welcome!
> [1] http://www.meetup.com/ApacheNiFi/events/229158777/
> [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
> [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
> [4]
> https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
> [5] http://stackoverflow.com/questions/960072/rounding-errors
> [6] https://issues.apache.org/jira/browse/NIFI-1662
> Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle:
> <http://percivalllinkedin.com/in/Percivalle:> joepercivall@yahoo.com
>
>
>

Re: [Discuss] Expression Language Options to Add Decimal Support

Posted by Andy LoPresto <al...@apache.org>.
I would support option 4, then option 1. I think this is a significant change, so backward-compatibility is important, and I don’t see specifying the return type to be a high hurdle provided the documentation is clear. If someone is writing in expression language, there is a minimum expectation of exposure to the documentation, as it is a bespoke implementation, not an OTS offering that they would have previous experience with.

I think options 2 and 3 introduce the opportunity to compromise previously-functioning expressions in a way that could be very difficult to troubleshoot.

I do appreciate you discovering and diagnosing this. I can imagine there are already scenarios where EL is failing silently and this should definitely be fixed correctly.


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Apr 20, 2016, at 7:16 PM, Joe Percivall <jo...@yahoo.com.INVALID> wrote:
> 
> Hello Dev list,
> I've been working on a couple improvements that deal with utilizing expression language to do data analysis and this has exposed a couple issues with the typing of numbers in Expression Language (EL). The improvements are a continuation of the topic I presented on at the Apache NiFi MeetUp group in MD[1].
> The primary issue I came across is that currently in EL all numbers interpreted as longs[2], which only store whole numbers. This leads to problems when trying to do things like dividing whole numbers or just trying to add/subtract decimals. I am actually surprised that the request for decimals this hasn't come up before. That being said, after some initial discussion with Tony and Joe, I believe that there are four potential ways forward.
> 1: Create a new EL type "decimal" backed by a double[3] and new methods to support it ("add_decimal"): This allows the user to explicitly choose whether or not they want to use decimal or whole numbers. It retains the simple use-cases that use whole numbers while opening up the new use-cases using decimals. One down side is that it is more verbose. It means adding a new function for each math operation. This is backwards compatible.
> 
> 2: Back all numbers by doubles instead of longs: The easy to implement and retains very concise methods ("add", "subtract", etc..). A few cons, doubles have a lower precision than longs[4],  can lead to rounding errors[5] and could be confusing for users who only work with whole numbers to suddenly find decimals.This is not backwards compatible.
> 3: Create a new EL type "decimal" back by a double[3] and attempt to smartly automatically cast depending on method/input/output: This would allow for the positives of having decimals and whole numbers in addition to having concise methods. The main cons being a much longer implementation time to do it right and the "shadiness" of doing things automatically for the user. Also this would mean the user wouldn't have the option to explicitly override  This is not backwards compatible.
> 4: Create a new EL type "decimal" backed by a double[3] and overload the existing methods with an additional parameter to specify return type to support it: This would allow for the positives of having decimals and whole numbers in addition to having concise method names but this may cause confusion with less technical users who aren't used to specifying return types. This is backwards compatible.
> 
> The options that are not backwards compatible would need to wait to be implemented until 1.0.
> The current option I am leaning towards is number 1 due to the explicitness and greater control it gives the user. While it is more verbose I think the decimal vs whole number syntax will be easy for even non-technical users to pick up. Also I currently have a PR for it up here[6].
> Any other ideas or suggestions are welcome!
> [1] http://www.meetup.com/ApacheNiFi/events/229158777/
> [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
> [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
> [4] https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
> [5] http://stackoverflow.com/questions/960072/rounding-errors
> [6] https://issues.apache.org/jira/browse/NIFI-1662
> Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joepercivall@yahoo.com