You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/03/20 16:53:39 UTC

[GitHub] [arrow] westonpace opened a new issue, #34644: [C++] Prefer unsafe casting by default in Substrait

westonpace opened a new issue, #34644:
URL: https://github.com/apache/arrow/issues/34644

   ### Describe the enhancement requested
   
   Substrait specifies a failure behavior (return null or error) for casting.  However, it does not specify what exactly constitutes a failure. When we added bindings for the cast expression we assumed that Substrait wanted a safe cast.  However, I have since learned that most existing engines (e.g. postgres, spark, etc.) will perform an unsafe cast by default and only consider a failure when casting is impossible (e.g. "foo" to integer).
   
   I'd like to eventually see Substrait add support for specifying more precisely what kind of casting should occur.  In the meantime I think we should update the default to unsafe casting to match user expectation.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on issue #34644: [C++] Prefer unsafe casting by default in Substrait

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on issue #34644:
URL: https://github.com/apache/arrow/issues/34644#issuecomment-1485098872

   @westonpace For now I am fine with the default behavior. One situation that I think we would need more flexible is if we found that some Acero "unsafe" casting behavior is different from pandas (which is the standard we apply internally), then we probably want to some finer control over the Acero default behavior. But we haven't found an issue yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #34644: [C++] Prefer unsafe casting by default in Substrait

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #34644:
URL: https://github.com/apache/arrow/issues/34644#issuecomment-1477411956

   On the other hand, float truncation is indeed done by postgres:
   
   ```
   test_db=# SELECT CAST (1.5 AS integer);
    int4 
   ------
       2
   (1 row)
   ```
   
   (although it is interesting to see that the result is 2, while in arrow casting 1.5 to int64 with allow_float_truncate=True gives 1)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss closed issue #34644: [C++] Prefer unsafe casting by default in Substrait

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss closed issue #34644: [C++] Prefer unsafe casting by default in Substrait
URL: https://github.com/apache/arrow/issues/34644


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #34644: [C++] Prefer unsafe casting by default in Substrait

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #34644:
URL: https://github.com/apache/arrow/issues/34644#issuecomment-1477405158

   > However, I have since learned that most existing engines (e.g. postgres, spark, etc.) will perform an unsafe cast by default
   
   For postgres, that is not what I see based on a quick test for integer overflow:
   
   ```
   test_db=# SELECT CAST (10000000 AS smallint);
   ERROR:  smallint out of range
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #34644: [C++] Prefer unsafe casting by default in Substrait

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34644:
URL: https://github.com/apache/arrow/issues/34644#issuecomment-1478724350

   It seems SQL server truncates so I think we'll need both behaviors.  Sounds like something else we will need to document more thoroughly in Substrait and will also need to be added to the list of dialect behaviors.  My original motivation was to allow float->int.  I'm pretty open to whatever default we want (e.g. allow float/decimal truncation but not integer truncation) but I believe @icexelloss wanted something more flexible than defaulting to "safe only".
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org