You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jonahgao (via GitHub)" <gi...@apache.org> on 2024/03/08 09:41:49 UTC

[PR] fix: `generate_series` and `range` panic on edge cases [arrow-datafusion]

jonahgao opened a new pull request, #9503:
URL: https://github.com/apache/arrow-datafusion/pull/9503

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #9499.
   
   ## Rationale for this change
   Some arithmetic operations on `i64` may overflow, causing panic. For example
   https://github.com/apache/arrow-datafusion/blob/fc81bf10ea7cde55c6d58a670e15bfd0581ec8c2/datafusion/functions-array/src/kernels.rs#L303
   https://github.com/apache/arrow-datafusion/blob/fc81bf10ea7cde55c6d58a670e15bfd0581ec8c2/datafusion/functions-array/src/kernels.rs#L304
   https://github.com/apache/arrow-datafusion/blob/fc81bf10ea7cde55c6d58a670e15bfd0581ec8c2/datafusion/functions-array/src/kernels.rs#L307
   
   This PR tries to replace these arithmetic operations with `RangeInclusive` to avoid overflow.
   ## What changes are included in this PR?
   fix panic
   
   ## Are these changes tested?
   Yes
   
   ## Are there any user-facing changes?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: `generate_series` and `range` panic on edge cases [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #9503:
URL: https://github.com/apache/arrow-datafusion/pull/9503


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: `generate_series` and `range` panic on edge cases [arrow-datafusion]

Posted by "jonahgao (via GitHub)" <gi...@apache.org>.
jonahgao commented on code in PR #9503:
URL: https://github.com/apache/arrow-datafusion/pull/9503#discussion_r1517719118


##########
datafusion/functions-array/src/kernels.rs:
##########
@@ -306,9 +306,15 @@ pub(super) fn gen_range(args: &[ArrayRef], include_upper: bool) -> Result<ArrayR
             );
         }
         // Below, we utilize `usize` to represent steps.
-        // But on 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(target_pointer_width = "64")]
+        let step_abs = step.unsigned_abs() as usize;
+        // On 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(not(target_pointer_width = "64"))]
         let step_abs = usize::try_from(step.unsigned_abs()).map_err(|_| {

Review Comment:
   > If `usize::try_from(step.unsigned_abs()).map_err` can handle whatever the target size is, I think we don't need to split the code for two condition
   
   Perhaps simpler is better. Let me merge them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: `generate_series` and `range` panic on edge cases [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #9503:
URL: https://github.com/apache/arrow-datafusion/pull/9503#issuecomment-1986432494

   Thank you @jonahgao and @jayzhan211 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: `generate_series` and `range` panic on edge cases [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #9503:
URL: https://github.com/apache/arrow-datafusion/pull/9503#discussion_r1517711410


##########
datafusion/functions-array/src/kernels.rs:
##########
@@ -306,9 +306,15 @@ pub(super) fn gen_range(args: &[ArrayRef], include_upper: bool) -> Result<ArrayR
             );
         }
         // Below, we utilize `usize` to represent steps.
-        // But on 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(target_pointer_width = "64")]
+        let step_abs = step.unsigned_abs() as usize;
+        // On 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(not(target_pointer_width = "64"))]
         let step_abs = usize::try_from(step.unsigned_abs()).map_err(|_| {

Review Comment:
   If we are running in 32 bit target machine, are we able to reach the code here and get step with i64? 🤔 



##########
datafusion/functions-array/src/kernels.rs:
##########
@@ -306,9 +306,15 @@ pub(super) fn gen_range(args: &[ArrayRef], include_upper: bool) -> Result<ArrayR
             );
         }
         // Below, we utilize `usize` to represent steps.
-        // But on 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(target_pointer_width = "64")]
+        let step_abs = step.unsigned_abs() as usize;
+        // On 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(not(target_pointer_width = "64"))]
         let step_abs = usize::try_from(step.unsigned_abs()).map_err(|_| {

Review Comment:
   If we are running in 32-bit target machine, are we able to reach the code here and get the `step` with i64? 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: `generate_series` and `range` panic on edge cases [arrow-datafusion]

Posted by "jonahgao (via GitHub)" <gi...@apache.org>.
jonahgao commented on code in PR #9503:
URL: https://github.com/apache/arrow-datafusion/pull/9503#discussion_r1517740398


##########
datafusion/functions-array/src/kernels.rs:
##########
@@ -306,9 +306,15 @@ pub(super) fn gen_range(args: &[ArrayRef], include_upper: bool) -> Result<ArrayR
             );
         }
         // Below, we utilize `usize` to represent steps.
-        // But on 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(target_pointer_width = "64")]
+        let step_abs = step.unsigned_abs() as usize;
+        // On 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(not(target_pointer_width = "64"))]
         let step_abs = usize::try_from(step.unsigned_abs()).map_err(|_| {

Review Comment:
   > If we are running in 32-bit target machine, are we able to reach the code here and get the `step` with i64? 🤔
   
   And Rust offers all integer types all the way up to 128-bit ones on all architectures.
   From: https://users.rust-lang.org/t/numbers-in-rust/87667



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: `generate_series` and `range` panic on edge cases [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #9503:
URL: https://github.com/apache/arrow-datafusion/pull/9503#discussion_r1517709553


##########
datafusion/functions-array/src/kernels.rs:
##########
@@ -306,9 +306,15 @@ pub(super) fn gen_range(args: &[ArrayRef], include_upper: bool) -> Result<ArrayR
             );
         }
         // Below, we utilize `usize` to represent steps.
-        // But on 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(target_pointer_width = "64")]
+        let step_abs = step.unsigned_abs() as usize;
+        // On 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(not(target_pointer_width = "64"))]
         let step_abs = usize::try_from(step.unsigned_abs()).map_err(|_| {

Review Comment:
   If `usize::try_from(step.unsigned_abs()).map_err` can handle whatever the target size is, I think we don't need to split the code for two condition



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: `generate_series` and `range` panic on edge cases [arrow-datafusion]

Posted by "jonahgao (via GitHub)" <gi...@apache.org>.
jonahgao commented on code in PR #9503:
URL: https://github.com/apache/arrow-datafusion/pull/9503#discussion_r1517720672


##########
datafusion/functions-array/src/kernels.rs:
##########
@@ -306,9 +306,15 @@ pub(super) fn gen_range(args: &[ArrayRef], include_upper: bool) -> Result<ArrayR
             );
         }
         // Below, we utilize `usize` to represent steps.
-        // But on 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(target_pointer_width = "64")]
+        let step_abs = step.unsigned_abs() as usize;
+        // On 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(not(target_pointer_width = "64"))]
         let step_abs = usize::try_from(step.unsigned_abs()).map_err(|_| {

Review Comment:
   > If we are running in 32-bit target machine, are we able to reach the code here and get the `step` with i64? 🤔
   
   I think it's possible, because `step` comes from user input.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: `generate_series` and `range` panic on edge cases [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #9503:
URL: https://github.com/apache/arrow-datafusion/pull/9503#discussion_r1517711410


##########
datafusion/functions-array/src/kernels.rs:
##########
@@ -306,9 +306,15 @@ pub(super) fn gen_range(args: &[ArrayRef], include_upper: bool) -> Result<ArrayR
             );
         }
         // Below, we utilize `usize` to represent steps.
-        // But on 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(target_pointer_width = "64")]
+        let step_abs = step.unsigned_abs() as usize;
+        // On 32-bit targets, the absolute value of `i64` may fail to fit into `usize`.
+        #[cfg(not(target_pointer_width = "64"))]
         let step_abs = usize::try_from(step.unsigned_abs()).map_err(|_| {

Review Comment:
   If we are running in 32 bit target machine, do we able to reach the code here and get step with i64? 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org