You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/24 05:39:23 UTC

[GitHub] [arrow-rs] viirya opened a new issue, #2568: Add sql-compliant feature for enabling sql-compliant kernel behavior

viirya opened a new issue, #2568:
URL: https://github.com/apache/arrow-rs/issues/2568

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   <!--
   A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for this feature, in addition to  the *what*)
   -->
   
   Some kernels behaves different with SQL semantics. For example, NaN is equal to NaN with SQL semantics. Using current comparison kernels in SQL system leads to different behavior and generates incorrect results.
   
   **Describe the solution you'd like**
   <!--
   A clear and concise description of what you want to happen.
   -->
   
   We should provide SQL-compliant kernels which can be enabled by feature flag.
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #2568: Add sql-compliant feature for enabling sql-compliant kernel behavior

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #2568:
URL: https://github.com/apache/arrow-rs/issues/2568#issuecomment-1230168260

   There is an IEEE standard for total ordering of NaNs, I wonder if we could use that? Aside from being a standard, where I've not managed to find an authoritative SQL standard for floats, it can be implemented with relatively cheap bit manipulation, instead of more expensive branching.
   
   There is a built-in implementation [here](https://doc.rust-lang.org/std/primitive.f32.html#method.total_cmp). Theoretically the performance may be good enough that we can just always have this behaviour without needing a feature flag?
   
   What do you think?
   
   Also tagging @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on issue #2568: Add sql-compliant feature for enabling sql-compliant kernel behavior

Posted by GitBox <gi...@apache.org>.
viirya commented on issue #2568:
URL: https://github.com/apache/arrow-rs/issues/2568#issuecomment-1230594220

   I think `total_cmp` looks promising as it treats NaN ordering the same way as Spark/Postgresql.  cc @sunchao to take a look too in case if I miss any point here. 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya closed issue #2568: Add sql-compliant feature for enabling sql-compliant kernel behavior

Posted by GitBox <gi...@apache.org>.
viirya closed issue #2568: Add sql-compliant feature for enabling sql-compliant kernel behavior
URL: https://github.com/apache/arrow-rs/issues/2568


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on issue #2568: Add sql-compliant feature for enabling sql-compliant kernel behavior

Posted by GitBox <gi...@apache.org>.
viirya commented on issue #2568:
URL: https://github.com/apache/arrow-rs/issues/2568#issuecomment-1230579764

   Hmm, the NaN ordering of the IEEE standard treats NaN value larger than any other numbers. Looks it is consistent with the SQL-compliant ordering we need. I can rewrite these kernels with [total_cmp](https://doc.rust-lang.org/std/primitive.f32.html#method.total_cmp). As without a flag, it is somehow breaking existing behavior but I think current behavior can be thought as wrong or not useful.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] NGA-TRAN commented on issue #2568: Add sql-compliant feature for enabling sql-compliant kernel behavior

Posted by GitBox <gi...@apache.org>.
NGA-TRAN commented on issue #2568:
URL: https://github.com/apache/arrow-rs/issues/2568#issuecomment-1230615467

   @viirya 
   > For example, by definition, NaN is not equal to itself but NaN is equal to NaN with SQL semantics
   Can you provide a more specific example for `NaN is equal to NaN with SQL semantics`? This document says NaN is not equal to itself in SQL https://www.vertica.com/blog/vertica-quick-tip-query-nan-values/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] sunchao commented on issue #2568: Add sql-compliant feature for enabling sql-compliant kernel behavior

Posted by GitBox <gi...@apache.org>.
sunchao commented on issue #2568:
URL: https://github.com/apache/arrow-rs/issues/2568#issuecomment-1230656665

   > I think total_cmp looks promising as it treats NaN ordering the same way as Spark/Postgresql. cc @sunchao to take a look too in case if I miss any point here. 😄
   
   I took a look too, and yes it looks like `total_cmp` exhibits the same behavior as Spark/Postgres/Snowflake etc, so I think it could be good idea to replace the existing code with it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on issue #2568: Add sql-compliant feature for enabling sql-compliant kernel behavior

Posted by GitBox <gi...@apache.org>.
viirya commented on issue #2568:
URL: https://github.com/apache/arrow-rs/issues/2568#issuecomment-1230654051

   We've discussed this in PRs. It depends on SQL implementations. As far as I know, Spark and postgresql treats NaNs equal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org