You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/07 15:08:06 UTC

[GitHub] [arrow-rs] tustvold opened a new pull request, #2846: Cleanup cast kernel

tustvold opened a new pull request, #2846:
URL: https://github.com/apache/arrow-rs/pull/2846

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   The cast kernel predates a lot of the work to introduce safe APIs for interacting with arrays, inspired by the work of arrow2. This updates the cast kernel to make use of these APIs :tada:
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   Uses PrimitiveArray::reinterpret_cast added by #2785 to reduce use of unsafe
   
   Cleans up some other callsites
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2846: Cleanup cast kernel

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2846:
URL: https://github.com/apache/arrow-rs/pull/2846#discussion_r990606974


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1329,30 +1321,13 @@ pub fn cast_with_options(
             let converted = if from_size >= to_size {
                 divide_scalar(&time_array, from_size / to_size)?
             } else {
-                multiply(
-                    &time_array,
-                    &Int64Array::from(vec![to_size / from_size; array.len()]),
-                )?
+                multiply_scalar(&time_array, to_size / from_size)?
             };
-            let array_ref = Arc::new(converted) as ArrayRef;
-            use TimeUnit::*;
-            match to_unit {
-                Second => {
-                    cast_array_data::<TimestampSecondType>(&array_ref, to_type.clone())
-                }
-                Millisecond => cast_array_data::<TimestampMillisecondType>(
-                    &array_ref,
-                    to_type.clone(),
-                ),
-                Microsecond => cast_array_data::<TimestampMicrosecondType>(
-                    &array_ref,
-                    to_type.clone(),
-                ),
-                Nanosecond => cast_array_data::<TimestampNanosecondType>(
-                    &array_ref,
-                    to_type.clone(),
-                ),
-            }
+            Ok(make_timestamp_array(
+                &converted,
+                to_unit.clone(),
+                to_tz.clone(),
+            ))

Review Comment:
   Not sure, the values should be UTC. Regardless this was the prior behaviour and there is a ticket already taking this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2846: Cleanup cast kernel

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2846:
URL: https://github.com/apache/arrow-rs/pull/2846#discussion_r990606974


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1329,30 +1321,13 @@ pub fn cast_with_options(
             let converted = if from_size >= to_size {
                 divide_scalar(&time_array, from_size / to_size)?
             } else {
-                multiply(
-                    &time_array,
-                    &Int64Array::from(vec![to_size / from_size; array.len()]),
-                )?
+                multiply_scalar(&time_array, to_size / from_size)?
             };
-            let array_ref = Arc::new(converted) as ArrayRef;
-            use TimeUnit::*;
-            match to_unit {
-                Second => {
-                    cast_array_data::<TimestampSecondType>(&array_ref, to_type.clone())
-                }
-                Millisecond => cast_array_data::<TimestampMillisecondType>(
-                    &array_ref,
-                    to_type.clone(),
-                ),
-                Microsecond => cast_array_data::<TimestampMicrosecondType>(
-                    &array_ref,
-                    to_type.clone(),
-                ),
-                Nanosecond => cast_array_data::<TimestampNanosecondType>(
-                    &array_ref,
-                    to_type.clone(),
-                ),
-            }
+            Ok(make_timestamp_array(
+                &converted,
+                to_unit.clone(),
+                to_tz.clone(),
+            ))

Review Comment:
   Not sure, the values should be UTC. Regardless this was the prior behaviour and there is a ticket already taking this https://github.com/apache/arrow-rs/issues/1936



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2846: Cleanup cast kernel

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2846:
URL: https://github.com/apache/arrow-rs/pull/2846#discussion_r990236304


##########
arrow-array/src/array/primitive_array.rs:
##########
@@ -769,11 +769,16 @@ impl<T: ArrowTimestampType> PrimitiveArray<T> {
 
     /// Construct a timestamp array with new timezone
     pub fn with_timezone(&self, timezone: String) -> Self {
+        self.with_timezone_opt(Some(timezone))

Review Comment:
   I went to add this method as was delighted to find #2347 @viirya had already added it. Apparently I reviewed it but have no recollection :laughing: 
   
   Added an optional variant of it, as for this specific case it makes it easier to work with



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #2846: Cleanup cast kernel

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #2846:
URL: https://github.com/apache/arrow-rs/pull/2846#discussion_r990610425


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1329,30 +1321,13 @@ pub fn cast_with_options(
             let converted = if from_size >= to_size {
                 divide_scalar(&time_array, from_size / to_size)?
             } else {
-                multiply(
-                    &time_array,
-                    &Int64Array::from(vec![to_size / from_size; array.len()]),
-                )?
+                multiply_scalar(&time_array, to_size / from_size)?
             };
-            let array_ref = Arc::new(converted) as ArrayRef;
-            use TimeUnit::*;
-            match to_unit {
-                Second => {
-                    cast_array_data::<TimestampSecondType>(&array_ref, to_type.clone())
-                }
-                Millisecond => cast_array_data::<TimestampMillisecondType>(
-                    &array_ref,
-                    to_type.clone(),
-                ),
-                Microsecond => cast_array_data::<TimestampMicrosecondType>(
-                    &array_ref,
-                    to_type.clone(),
-                ),
-                Nanosecond => cast_array_data::<TimestampNanosecondType>(
-                    &array_ref,
-                    to_type.clone(),
-                ),
-            }
+            Ok(make_timestamp_array(
+                &converted,
+                to_unit.clone(),
+                to_tz.clone(),
+            ))

Review Comment:
   Okay as it is the prior behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #2846: Cleanup cast kernel

Posted by GitBox <gi...@apache.org>.
tustvold merged PR #2846:
URL: https://github.com/apache/arrow-rs/pull/2846


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #2846: Cleanup cast kernel

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #2846:
URL: https://github.com/apache/arrow-rs/pull/2846#discussion_r990605059


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1329,30 +1321,13 @@ pub fn cast_with_options(
             let converted = if from_size >= to_size {
                 divide_scalar(&time_array, from_size / to_size)?
             } else {
-                multiply(
-                    &time_array,
-                    &Int64Array::from(vec![to_size / from_size; array.len()]),
-                )?
+                multiply_scalar(&time_array, to_size / from_size)?
             };
-            let array_ref = Arc::new(converted) as ArrayRef;
-            use TimeUnit::*;
-            match to_unit {
-                Second => {
-                    cast_array_data::<TimestampSecondType>(&array_ref, to_type.clone())
-                }
-                Millisecond => cast_array_data::<TimestampMillisecondType>(
-                    &array_ref,
-                    to_type.clone(),
-                ),
-                Microsecond => cast_array_data::<TimestampMicrosecondType>(
-                    &array_ref,
-                    to_type.clone(),
-                ),
-                Nanosecond => cast_array_data::<TimestampNanosecondType>(
-                    &array_ref,
-                    to_type.clone(),
-                ),
-            }
+            Ok(make_timestamp_array(
+                &converted,
+                to_unit.clone(),
+                to_tz.clone(),
+            ))

Review Comment:
   Hmm, for this kind of conversion between timezones, I think `from_tz` must be non empty. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2846: Cleanup cast kernel

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2846:
URL: https://github.com/apache/arrow-rs/pull/2846#discussion_r990194166


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1617,33 +1561,6 @@ fn cast_decimal_to_decimal<const BYTE_WIDTH1: usize, const BYTE_WIDTH2: usize>(
     }
 }
 
-/// Cast an array by changing its array_data type to the desired type

Review Comment:
   This function isn't sound, as it basically allows arbitrary transmutation of ArrayData :sweat_smile:
   
   Tbh this was the major motivator for this PR, whilst unsoundness in internal APIs is not nearly as bad as in public APIs, it is still something to be avoided.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #2846: Cleanup cast kernel

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #2846:
URL: https://github.com/apache/arrow-rs/pull/2846#issuecomment-1271731959

   Need to revisit how this is handling timestamps


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #2846: Cleanup cast kernel

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #2846:
URL: https://github.com/apache/arrow-rs/pull/2846#issuecomment-1272275465

   Benchmark runs are scheduled for baseline = 5cf46d43e8ede2ff3f291de85a27fce25ff2e9c4 and contender = 9db75183114ba0bb8dbbe4d3c52a60aead66f0e4. 9db75183114ba0bb8dbbe4d3c52a60aead66f0e4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f8490e1a485e42418a9e2998ac0571fe...420bc86b49b44cdab9d65ec9177306b2/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/4ee81d442b1d4c77a534c40dc018a0ac...5c307bd5480a4f2d9fd03fd784d6d8bb/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/906ee81f341441f7a9c1d1112807db52...8b14cef6d67c446d9f266c6dea8c129b/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/383b53cd50ca4203b650d99b4d4d1280...5cc2a2962879451491b11dca9c0fb08c/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #2846: Cleanup cast kernel

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #2846:
URL: https://github.com/apache/arrow-rs/pull/2846#issuecomment-1272274350

   As ever, thank you for reviewing my PRs 😄


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #2846: Cleanup cast kernel

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #2846:
URL: https://github.com/apache/arrow-rs/pull/2846#issuecomment-1272356245

   @tustvold You're welcome! ❤️ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org