You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "smallzhongfeng (via GitHub)" <gi...@apache.org> on 2023/11/23 09:31:33 UTC

[PR] fix: wrong result of range function [arrow-datafusion]

smallzhongfeng opened a new pull request, #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #8311.
   
   ## Rationale for this change
   When step is a negative number, the decrement calculation is performed from front to back instead of using the `step_by` method under iterator, because `step_by` stipulates that the parameter type is usize
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   ## What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   ## Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   ## Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1405475237


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -2514,6 +2520,67 @@ mod tests {
             .is_null(0));
     }
 
+    #[test]
+    fn test_array_range() {

Review Comment:
   I endorse this proposal as it promises to enhance convenience in directing individuals to incorporate new tests, thereby minimizing potential misunderstandings.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1404074354


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -741,13 +741,22 @@ pub fn gen_range(args: &[ArrayRef]) -> Result<ArrayRef> {
     let mut offsets = vec![0];
     for (idx, stop) in stop_array.iter().enumerate() {
         let stop = stop.unwrap_or(0);
-        let start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
+        let mut start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
         let step = step_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(1);
         if step == 0 {
             return exec_err!("step can't be 0 for function range(start [, stop, step]");
         }
-        let value = (start..stop).step_by(step as usize);
-        values.extend(value);
+        let value;
+        if step < 0 {
+            while stop < start {
+                values.push(start);
+                start += step;
+            }
+        } else {
+            value = (start..stop).step_by(step as usize);
+            values.extend(value.clone());
+        }
+

Review Comment:
   Would you mind putting the test in sqllogoictest?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "smallzhongfeng (via GitHub)" <gi...@apache.org>.
smallzhongfeng commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1404329259


##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -2752,7 +2752,7 @@ select range(5),
        range(1, -5, 1)
 ;

Review Comment:
   Updated~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1405517728


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -2514,6 +2520,67 @@ mod tests {
             .is_null(0));
     }
 
+    #[test]
+    fn test_array_range() {

Review Comment:
   @alamb @Weijun-H 
   There is one problem actually, function like `align_array_dimensions` we have a unit test for it but it is not possible to port to slt file. We should either not test it directly but rely on array function test that call it or port it to datafusion::common::utils, and test it there. I think the latter is better so we can check the coverage more easily.
   
   btw, should we introduce array_utils in common. I think it will grow larger and larger quickly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1403601336


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -741,13 +741,22 @@ pub fn gen_range(args: &[ArrayRef]) -> Result<ArrayRef> {
     let mut offsets = vec![0];
     for (idx, stop) in stop_array.iter().enumerate() {
         let stop = stop.unwrap_or(0);
-        let start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
+        let mut start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
         let step = step_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(1);
         if step == 0 {
             return exec_err!("step can't be 0 for function range(start [, stop, step]");
         }
-        let value = (start..stop).step_by(step as usize);
-        values.extend(value);
+        let value;
+        if step < 0 {
+            while stop < start {
+                values.push(start);
+                start += step;
+            }
+        } else {
+            value = (start..stop).step_by(step as usize);
+            values.extend(value.clone());
+        }
+

Review Comment:
   Yes, and I think we should add a test case for it 👍 



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -741,13 +741,22 @@ pub fn gen_range(args: &[ArrayRef]) -> Result<ArrayRef> {
     let mut offsets = vec![0];
     for (idx, stop) in stop_array.iter().enumerate() {
         let stop = stop.unwrap_or(0);
-        let start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
+        let mut start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
         let step = step_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(1);
         if step == 0 {
             return exec_err!("step can't be 0 for function range(start [, stop, step]");
         }
-        let value = (start..stop).step_by(step as usize);
-        values.extend(value);
+        let value;
+        if step < 0 {
+            while stop < start {
+                values.push(start);
+                start += step;
+            }
+        } else {
+            value = (start..stop).step_by(step as usize);
+            values.extend(value.clone());
+        }
+

Review Comment:
   Yes, and I think we should add a test case for it 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "smallzhongfeng (via GitHub)" <gi...@apache.org>.
smallzhongfeng commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1403914468


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -741,13 +741,22 @@ pub fn gen_range(args: &[ArrayRef]) -> Result<ArrayRef> {
     let mut offsets = vec![0];
     for (idx, stop) in stop_array.iter().enumerate() {
         let stop = stop.unwrap_or(0);
-        let start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
+        let mut start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
         let step = step_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(1);
         if step == 0 {
             return exec_err!("step can't be 0 for function range(start [, stop, step]");
         }
-        let value = (start..stop).step_by(step as usize);
-        values.extend(value);
+        let value;
+        if step < 0 {
+            while stop < start {
+                values.push(start);
+                start += step;
+            }
+        } else {
+            value = (start..stop).step_by(step as usize);
+            values.extend(value.clone());
+        }
+

Review Comment:
   OK, I added ut for this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1404710886


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -2514,6 +2520,67 @@ mod tests {
             .is_null(0));
     }
 
+    #[test]
+    fn test_array_range() {

Review Comment:
   remove test here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1403493996


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -741,13 +741,22 @@ pub fn gen_range(args: &[ArrayRef]) -> Result<ArrayRef> {
     let mut offsets = vec![0];
     for (idx, stop) in stop_array.iter().enumerate() {
         let stop = stop.unwrap_or(0);
-        let start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
+        let mut start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);

Review Comment:
   ```suggestion
           let start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
   ```



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -741,13 +741,22 @@ pub fn gen_range(args: &[ArrayRef]) -> Result<ArrayRef> {
     let mut offsets = vec![0];
     for (idx, stop) in stop_array.iter().enumerate() {
         let stop = stop.unwrap_or(0);
-        let start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
+        let mut start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
         let step = step_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(1);
         if step == 0 {
             return exec_err!("step can't be 0 for function range(start [, stop, step]");
         }
-        let value = (start..stop).step_by(step as usize);
-        values.extend(value);
+        let value;
+        if step < 0 {
+            while stop < start {
+                values.push(start);
+                start += step;
+            }
+        } else {
+            value = (start..stop).step_by(step as usize);
+            values.extend(value.clone());
+        }
+

Review Comment:
   ```suggestion
           if step < 0 {
               // Decreasing range
               values.extend((stop..start).rev().step_by((-step) as usize));
           } else {
               // Increasing range
               values.extend((start..stop).step_by(step as usize));
           }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1404128100


##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -2752,7 +2752,7 @@ select range(5),
        range(1, -5, 1)
 ;

Review Comment:
   ```suggestion
          range(1, -5, -1)
   ;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#issuecomment-1828739177

   Thank you everyone!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "smallzhongfeng (via GitHub)" <gi...@apache.org>.
smallzhongfeng commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1403544744


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -741,13 +741,22 @@ pub fn gen_range(args: &[ArrayRef]) -> Result<ArrayRef> {
     let mut offsets = vec![0];
     for (idx, stop) in stop_array.iter().enumerate() {
         let stop = stop.unwrap_or(0);
-        let start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
+        let mut start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
         let step = step_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(1);
         if step == 0 {
             return exec_err!("step can't be 0 for function range(start [, stop, step]");
         }
-        let value = (start..stop).step_by(step as usize);
-        values.extend(value);
+        let value;
+        if step < 0 {
+            while stop < start {
+                values.push(start);
+                start += step;
+            }
+        } else {
+            value = (start..stop).step_by(step as usize);
+            values.extend(value.clone());
+        }
+

Review Comment:
   Emm, this is good advice, but it should be
    `values.extend((stop+1..start+1).rev().step_by((-step) as usize));`
   Because the stop of the range is not taken.  🤔️



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -741,13 +741,22 @@ pub fn gen_range(args: &[ArrayRef]) -> Result<ArrayRef> {
     let mut offsets = vec![0];
     for (idx, stop) in stop_array.iter().enumerate() {
         let stop = stop.unwrap_or(0);
-        let start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
+        let mut start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
         let step = step_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(1);
         if step == 0 {
             return exec_err!("step can't be 0 for function range(start [, stop, step]");
         }
-        let value = (start..stop).step_by(step as usize);
-        values.extend(value);
+        let value;
+        if step < 0 {
+            while stop < start {
+                values.push(start);
+                start += step;
+            }
+        } else {
+            value = (start..stop).step_by(step as usize);
+            values.extend(value.clone());
+        }
+

Review Comment:
   Emm, this is good advice, but it should be
    `values.extend((stop+1..start+1).rev().step_by((-step) as usize));`
   Because the stop of the range is not taken.  🤔️



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "smallzhongfeng (via GitHub)" <gi...@apache.org>.
smallzhongfeng commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1404093086


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -741,13 +741,22 @@ pub fn gen_range(args: &[ArrayRef]) -> Result<ArrayRef> {
     let mut offsets = vec![0];
     for (idx, stop) in stop_array.iter().enumerate() {
         let stop = stop.unwrap_or(0);
-        let start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
+        let mut start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
         let step = step_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(1);
         if step == 0 {
             return exec_err!("step can't be 0 for function range(start [, stop, step]");
         }
-        let value = (start..stop).step_by(step as usize);
-        values.extend(value);
+        let value;
+        if step < 0 {
+            while stop < start {
+                values.push(start);
+                start += step;
+            }
+        } else {
+            value = (start..stop).step_by(step as usize);
+            values.extend(value.clone());
+        }
+

Review Comment:
   https://github.com/apache/arrow-datafusion/blob/b648d4e22e82989c65523e62312e1995a1543888/datafusion/sqllogictest/test_files/array.slt#L2748C1-L2752C23
   I think there is currently this test, but the results were wrong before, and I have corrected them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "Veeupup (via GitHub)" <gi...@apache.org>.
Veeupup commented on PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#issuecomment-1824538212

   thank you for fixing it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1405386301


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -2514,6 +2520,67 @@ mod tests {
             .is_null(0));
     }
 
+    #[test]
+    fn test_array_range() {

Review Comment:
   @jayzhan211  and @Weijun-H  I think what is happening is that people follow the model of the existing code -- so as long as there are tests in array_expressions (rather than .slt) that is what people will model.
   
   What do you think about migrating (as part of another PR) all the existing tests to sqllogictests -- this would likely result in all subsequent PRs following the sqllogictests model



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix: wrong result of range function [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #8313:
URL: https://github.com/apache/arrow-datafusion/pull/8313#discussion_r1404129746


##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -2752,7 +2752,7 @@ select range(5),
        range(1, -5, 1)
 ;

Review Comment:
   move the test to the sqllogistest, overall LGTM 👍 



##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -2752,7 +2752,7 @@ select range(5),
        range(1, -5, 1)
 ;

Review Comment:
   move the test to the sqllogistest, overall LGTM 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org