You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/02/26 22:32:49 UTC

[GitHub] [druid] sascha-coenen opened a new issue #9417: SQL: improve division behaviour

sascha-coenen opened a new issue #9417: SQL: improve division behaviour
URL: https://github.com/apache/druid/issues/9417
 
 
   ### Description
   
   Most databases implement the following behaviour to make it easy to formulate divisions and receive an expected outcome.
   
   Let me go over two or three differences between how Druid works relative to other SQL dialects and how this is leading to extremely long expressions even for simple cases of needing to compute a/b:
   
   `SELECT 1 / 2`
   in Vertica this evaluates to 0.5
   in Druid it evaluates to 0
   
   `SELECT 1 / 0`
   fails in Vertica and Druid
   
   `SELECT 1 / null`
   evaluates to null in Vertica
   in Druid it throws an exception.
   
   With the above behaviour, it is possible to formulate divisions that are null safe and also evaluate to correct results in a consice way:
   
   SUM(nominator) / NVL(SUM(denominator))
   
   In Druid, it is cumbersome that divisions are by default integer divisions and furthermore, it takes CASE statements to make a query robust against division-by-zero cases.
   
   So the simple intention a/b turns into:
   
   `SELECT CASE(b = 0.0) THEN NULL ELSE ( a / CAST(b AS DOUBLE) END`
   
   This becomes much worse if the denominator itself is a complex expression. In that case, it needs to be repeated in the CASE test redundantly. Our expressions for normal real-world conversion rates look like this:
   
   ```
   CASE ( SUM(validInteractionCount) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm')) ) WHEN 0.0 THEN 0.0 ELSE SUM(validBidPriceSum) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm')) / CAST(SUM(validInteractionCount) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm')) AS DOUBLE) END
   ```
   
   This could be simplified to the following, if the behaviour of division and null handling was as with other SQL database systems:
   
   ```
   SUM(validBidPriceSum) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm') / NVL(SUM(validInteractionCount) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm'))
   ```
   
   Ideally, one would just change the behaviour to be compatible with the above but as I anticipate that this would be rejected due to breaking backwards compatibility, I would alternatively propose to have a DIV() function that deals with division-by-zero and type conversion in a concise way
   
   ```
   DIV(nominator, denominator, null-fallback)
   
   e.g
   
   DIV(SUM(validBidPriceSum), SUM(validInteractionCount), 0.0) FILTER(WHERE activityTypeId IN('o', 'p', 'f', 'm')
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sascha-coenen commented on issue #9417: SQL: improve division behaviour

Posted by GitBox <gi...@apache.org>.
sascha-coenen commented on issue #9417: SQL: improve division behaviour
URL: https://github.com/apache/druid/issues/9417#issuecomment-591918423
 
 
   > Btw, I've never seen a one-arg NVL function.
   yeah. missing an arg. alternatively NULLIF(denominator, 0)
   
   > For the 1/NULL case, I think we should probably return NULL instead of throwing an error.
   that would be perfect 
   
   > Would adding that function solve your problem?
   yes, it would make things more concise.
   I mentioned the function DIV(nominator, denominator, null-fallback) above which is probably similar to their SAFE_DIVIDE function. 
   Having this as an infix operator rather than in function syntax would be even nicer so that there are less nesting levels, but I guess that's out of the question.
   
   Most importantly though, for this to be a perfectly useful function, it would have to divide numerically, meaning DIV(1, 2) should be 0.5 not 0.
   I would argue that user experience trumps technical arguments. So in the real world the expectation of 1 divided by 2 is 0.5 Having equal types is nice from a technical perspective, but to quote Steve Jobs, one should start with the user perspective and work one's way backwards to technology. (https://www.youtube.com/watch?v=r2O5qKZlI50)
   Also why have two different ways to do integer division and none for numerical division, so to even out the scales I would petition for having this DIV() function apply numerical division.
   
   Although it also has to be said that DIV usually represents integer division (https://www.w3schools.com/sql/func_mysql_div.asp) so it would be misleading to have it be a numerical division in Druid, so a different name like SAFE_DIV would be better I think.
   It is sad that databases differ so much on what "/" is doing. Since decimal division is the common case, IT should be the default, especially in the context of having a DIV function for doing the integer division. 
   Paragon of consistency indeed, lol :D

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sascha-coenen edited a comment on issue #9417: SQL: improve division behaviour

Posted by GitBox <gi...@apache.org>.
sascha-coenen edited a comment on issue #9417: SQL: improve division behaviour
URL: https://github.com/apache/druid/issues/9417#issuecomment-591918423
 
 
   > Btw, I've never seen a one-arg NVL function.
   
   yeah. missing an arg. alternatively NULLIF(denominator, 0)
   
   > For the 1/NULL case, I think we should probably return NULL instead of throwing an error.
   
   that would be perfect 
   
   > Would adding that function solve your problem?
   
   yes, it would make things more concise.
   I mentioned the function DIV(nominator, denominator, null-fallback) above which is probably similar to their SAFE_DIVIDE function. 
   Having this as an infix operator rather than in function syntax would be even nicer so that there are less nesting levels, but I guess that's out of the question.
   
   Most importantly though, for this to be a perfectly useful function, it would have to divide numerically, meaning DIV(1, 2) should be 0.5 not 0.
   I would argue that user experience trumps technical arguments. So in the real world the expectation of 1 divided by 2 is 0.5 Having equal types is nice from a technical perspective, but to quote Steve Jobs, one should start with the user perspective and work one's way backwards to technology. (https://www.youtube.com/watch?v=r2O5qKZlI50)
   Also why have two different ways to do integer division and none for numerical division, so to even out the scales I would petition for having this DIV() function apply numerical division.
   
   Although it also has to be said that DIV usually represents integer division (https://www.w3schools.com/sql/func_mysql_div.asp) so it would be misleading to have it be a numerical division in Druid, so a different name like SAFE_DIV would be better I think.
   It is sad that databases differ so much on what "/" is doing. Since decimal division is the common case, IT should be the default, especially in the context of having a DIV function for doing the integer division. 
   Paragon of consistency indeed, lol :D

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on issue #9417: SQL: improve division behaviour

Posted by GitBox <gi...@apache.org>.
gianm commented on issue #9417: SQL: improve division behaviour
URL: https://github.com/apache/druid/issues/9417#issuecomment-591697433
 
 
   I recall looking into this a while back and finding that the behavior of `SELECT 1 / 2` varied across popular databases, and we went with the current behavior (integer division) since it's type-preserving, which seemed like a nice property (if inputs are ints, output will be an int).
   
   I looked into a few systems and here's what I found for integer division behavior. I didn't look at Vertica but I added what you found.
   
   |System|1/2|1/0|1/NULL|
   |------|----|----|----|
   |Druid|0|error|error|
   |Vertica|0.5|error|0|
   |PostgreSQL|0|error|NULL|
   |BigQuery|0.5|error|NULL|
   
   Really, a paragon of consistency 😄 
   
   For the 1/0 case, every system I looked at throws an error. But BigQuery has a SAFE_DIVIDE function that looks interesting. It returns NULL in the 1/0 case and is the same as `/` otherwise. Would adding that function solve your problem?
   
   For the 1/2 case, f you want floating point division for `x / y`, you can do `CAST(x AS DOUBLE) / y`.
   
   For the 1/NULL case, I think we should probably return NULL instead of throwing an error.
   
   Btw, I've never seen a one-arg `NVL` function. Did you mean to use a different function in your example `SUM(nominator) / NVL(SUM(denominator))`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org