You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "danepitkin (via GitHub)" <gi...@apache.org> on 2023/05/11 18:29:52 UTC

[GitHub] [arrow] danepitkin opened a new issue, #35560: [Python][C++] Remove Scalar legacy cast implementation

danepitkin opened a new issue, #35560:
URL: https://github.com/apache/arrow/issues/35560

   ### Describe the enhancement requested
   
   Let's delete the legacy implementation `Scalar.CastTo()` from Arrow.
   
   Arrow has a compute kernel for cast operations that is now included in the base library[1]. PyArrow has migrated Scalar cast usage to this computer kernel[2].
   
   [1] https://github.com/apache/arrow/pull/34295
   [2] https://github.com/apache/arrow/pull/35395 
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [C++] Remove Scalar legacy cast implementation [arrow]

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35560:
URL: https://github.com/apache/arrow/issues/35560#issuecomment-1837254219

   Let's create sub issues and use separated PRs!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [C++] Remove Scalar legacy cast implementation [arrow]

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou closed issue #35560: [C++] Remove Scalar legacy cast implementation
URL: https://github.com/apache/arrow/issues/35560


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [C++] Remove Scalar legacy cast implementation [arrow]

Posted by "llama90 (via GitHub)" <gi...@apache.org>.
llama90 commented on issue #35560:
URL: https://github.com/apache/arrow/issues/35560#issuecomment-1837151416

   Hello,
   
   I am interested in this issue and looking into it.
   
   First, I found functions that are using the legacy `Scalar.CastTo()` in the code.
   
   I have reorganized these functions to call the `Cast` compute kernel where tests are possible.
   
   | file name         | code snippet                                                                               | note |
   |-------------------|--------------------------------------------------------------------------------------------|------|
   | file_parquet.cc   | `auto maybe_min = min->CastTo(field.type());`                                              | -    |
   | file_parquet.cc   | `auto maybe_max = max->CastTo(field.type());`                                              | -    |
   | partition_test.cc | `ASSERT_OK_AND_ASSIGN(auto dict_hello, MakeScalar("hello")->CastTo(DictStr("")->type()));` | 1    |
   | partition_test.cc | `ASSERT_OK_AND_ASSIGN(auto dict_hello, MakeScalar("hello")->CastTo(dict_type));`           | 1    |
   | partition_test.cc | `ASSERT_OK_AND_ASSIGN(auto day, StringScalar("2020-06-08").CastTo(temporal));`             | ✅    |
   | scalar.cc         | `auto maybe_repr = CastTo(utf8());`                                                        | -    |
   | scalar.cc         | `ARROW_ASSIGN_OR_RAISE(auto cast_value, from_.CastTo(dict_type.value_type()));`            | -    |
   | scalar.cc         | `return Int32Scalar(0).CastTo(dict_type.index_type()).Value(&out.index);`                  | -    |
   | scalar.h          | `Result<std::shared_ptr<Scalar>> CastTo(std::shared_ptr<DataType> to) const;`              | -    |
   | scalar_test.cc    | `TimestampScalar(value, timestamp(in)).CastTo(timestamp(out)).ValueOrDie();`               | 2    |
   | scalar_test.cc    | `TimestampScalar(1024, timestamp(TimeUnit::MILLI)).CastTo(utf8()));`                       | 2    |
   | scalar_test.cc    | `TimestampScalar(1024, timestamp(TimeUnit::MILLI)).CastTo(int64()));`                      | 2    |
   | scalar_test.cc    | `.CastTo(date64()));`                                                                      | 2    |
   | scalar_test.cc    | `ASSERT_OK_AND_ASSIGN(auto cast_to_other, scalar->CastTo(other_type))`                     | ✅    |
   | scalar_test.cc    | `ASSERT_OK_AND_ASSIGN(auto cast_from_other, other_scalar->CastTo(type))`                   | ✅    |
   | scalar_test.cc    | `StringScalar(std::string(repr)).CastTo(type));`                                           | ✅    |
   | scalar_test.cc    | `ASSERT_OK_AND_ASSIGN(auto cast_to_string, scalar->CastTo(utf8()));`                       | ✅    |
   | scalar_test.cc    | `EXPECT_OK_AND_ASSIGN(auto cast_scalar, scalar.CastTo(to_type));`                          | 3    |
   | scalar_test.cc    | `scalar.CastTo(to_type));`                                                                 | 3    |
   | scalar_test.cc    | `ASSERT_OK_AND_ASSIGN(auto cast_alpha, alpha->CastTo(ty));`                                | 1    |
   
   * 1: Unsupported cast from string to dictionary using function cast_dictionary
   * 2: ValueOrDie called on an error: Invalid: Casting from timestamp[ns] to timestamp[us] would lose data: 1234
   * 3: Unsupported cast from list_view<item: int16> to list using function cast_list
   * **In some functions, the same test cases may yield different errors depending on the additional cast kernel implementation**.
       * This means that in some cases, the errors are indicated based on the error messages triggered by the functions executed first.
   
   Looking at the table, the calling functions can be organized as follows:
   
   1. Cases where the cast operation is implemented and calling the Cast function works without additional implementation.
   2. Cases where additional implementation of the cast operation is needed.
   
   For now, I have submitted a PR for the cases that are working happy case, and I am curious if this aligns with the intentions mentioned in the issue.
   
   If PR aligns with the intent, I would like to proceed by separating the happy cases from those that require additional cast implementation, and I would appreciate your feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [C++] Remove Scalar legacy cast implementation [arrow]

Posted by "llama90 (via GitHub)" <gi...@apache.org>.
llama90 commented on issue #35560:
URL: https://github.com/apache/arrow/issues/35560#issuecomment-1837304923

   Hello @kou
   
   Thank you for your response. 
   
   I've created the issue excluding the happy cases and will first complete the PR for the happy cases. 
   
   I'm a bit concerned as I have never dealt with compute kernels and have not yet fully finalized the sorting for the dictionary for ChunkedArray and the other issues.
   
   However, I've recently had more time to look into Arrow, so I will proceed step by step.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [C++] Remove Scalar legacy cast implementation [arrow]

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35560:
URL: https://github.com/apache/arrow/issues/35560#issuecomment-1837397253

   Thanks!
   I've added sub-issues to this issue description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org