You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/04/11 14:05:29 UTC

[GitHub] [arrow] jorisvandenbossche opened a new issue, #35040: [C++] Change Scalar::CastTo to do safe cast by default and allow to specify cast options ?

jorisvandenbossche opened a new issue, #35040:
URL: https://github.com/apache/arrow/issues/35040

   ### Describe the enhancement requested
   
   See https://github.com/apache/arrow/issues/34901 for a longer discussion, but summarizing: the `pyarrow.Scalar` object has a `cast()` method, but in contrast with other cast methods in pyarrow it does an unsafe cast by default. We should probably change this to do a safe cast by default, and at the same time also allow to specify CastOptions (so a user can still choose to do an unsafe cast). 
   
   Example:
   
   ```python
   # scalar behaviour
   >>> pa.scalar(1.5)
   <pyarrow.DoubleScalar: 1.5>
   >>> pa.scalar(1.5).cast(pa.int64())
   <pyarrow.Int64Scalar: 1>
   
   # vs array behaviour
   >>> pa.array([1.5]).cast(pa.int64())
   ...
   ArrowInvalid: Float value 1.5 was truncated converting to int64
   ```
   
   The python cast() method calls the C++ `Scalar::ToCast`:
   
   https://github.com/apache/arrow/blob/e488942cd552ac36a46d40477c1b0326a626ed98/cpp/src/arrow/scalar.h#L99-L100
   
   which currently indeed doesn't have the option to pass CastOptions.
   
   In addition, it seems that for casting Scalars, we do have a somewhat custom implementation, and this doesn't use the generic Cast implementation (from the compute kernels), but has a custom `CastImpl` in scalar.cc. Not fully sure about the reason for this, but maybe historically we wanted to have scalar casting without relying on the optional compute module? (cfr https://github.com/apache/arrow/issues/25025)
   
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF closed issue #35040: [C++] Change Scalar::CastTo to do safe cast by default and allow to specify cast options ?

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF closed issue #35040: [C++] Change Scalar::CastTo to do safe cast by default and allow to specify cast options ?
URL: https://github.com/apache/arrow/issues/35040


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35040: [C++] Change Scalar::CastTo to do safe cast by default and allow to specify cast options ?

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35040:
URL: https://github.com/apache/arrow/issues/35040#issuecomment-1509258804

   I think this is a good idea.  The result will probably be a slight bit slower but working with scalars is already slightly slower anyways and it would be nice to have only a single implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on issue #35040: [C++] Change Scalar::CastTo to do safe cast by default and allow to specify cast options ?

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on issue #35040:
URL: https://github.com/apache/arrow/issues/35040#issuecomment-1544487275

   Casting of scalar now uses the compute kernel in pyarrow: https://github.com/apache/arrow/pull/35395
   
   Do we want to keep this issue open to also make changes to the `Scalar::CastTo` in the C++? Or maybe linking the PR for pyarrow with a different issue? @jorisvandenbossche 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on issue #35040: [C++] Change Scalar::CastTo to do safe cast by default and allow to specify cast options ?

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on issue #35040:
URL: https://github.com/apache/arrow/issues/35040#issuecomment-1544491929

   Great Dane, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on issue #35040: [C++] Change Scalar::CastTo to do safe cast by default and allow to specify cast options ?

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on issue #35040:
URL: https://github.com/apache/arrow/issues/35040#issuecomment-1530321370

   FYI I tried swapping the Scalar.cast() implementation to the compute kernel and current (but small) test suite passed while also fixing https://github.com/apache/arrow/issues/35370.
   
   ```
   ...
       def cast(self, object target_type=None, safe=None, options=None, memory_pool=None):
           return _pc().cast(self, target_type, safe=safe,
                             options=options, memory_pool=memory_pool)
   ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35040: [C++] Change Scalar::CastTo to do safe cast by default and allow to specify cast options ?

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35040:
URL: https://github.com/apache/arrow/issues/35040#issuecomment-1509259973

   Also, we've since extracted the cast kernels from the rest of the kernels and declared them as "essential kernels" that will always be present (even if ARROW_COMPUTE is OFF) so I think it is safe to rely on them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on issue #35040: [C++] Change Scalar::CastTo to do safe cast by default and allow to specify cast options ?

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on issue #35040:
URL: https://github.com/apache/arrow/issues/35040#issuecomment-1544489467

   I created https://github.com/apache/arrow/pull/35395 to track the issue!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org