You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/02/14 18:54:04 UTC

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3695: feat: Sort kernel for `RunArray`

tustvold commented on code in PR #3695:
URL: https://github.com/apache/arrow-rs/pull/3695#discussion_r1106244242


##########
arrow-array/src/run_iterator.rs:
##########
@@ -57,13 +57,8 @@ where
 {
     /// create a new iterator
     pub fn new(array: TypedRunArray<'a, R, V>) -> Self {
-        let current_front_physical: usize =
-            array.run_array().get_physical_index(0).unwrap();
-        let current_back_physical: usize = array
-            .run_array()
-            .get_physical_index(array.len() - 1)
-            .unwrap()
-            + 1;
+        let current_front_physical: usize = array.run_array().get_start_physical_index();
+        let current_back_physical: usize = array.run_array().get_end_physical_index() + 1;

Review Comment:
   ```suggestion
           let current_front_physical = array.run_array().get_start_physical_index();
           let current_back_physical = array.run_array().get_end_physical_index() + 1;
   ```



##########
arrow-ord/src/sort.rs:
##########
@@ -599,6 +618,206 @@ fn insert_valid_values<T>(result_slice: &mut [u32], offset: usize, valids: &[(u3
     append_valids(&mut result_slice[offset..offset + valids.len()]);
 }
 
+// Sort run array and return sorted run array.
+// The output RunArray will be encoded at the same level as input run array.
+// For e.g. an input RunArray { run_ends = [2,4,6,8], values = [1,2,1,2] }
+// will result in output RunArray { run_ends = [2,4,6,8], values = [1,1,2,2] }
+// and not RunArray { run_ends = [4,8], values = [1,2] }
+fn sort_run(

Review Comment:
   How much of a performance difference does this special case have? At least for query engines, it is very rare to be sorting a single column by itself, more common is sorting multiple columns by one (i.e. sort_to_indices) or more (i.e. lexsort_to_indices) other columns? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org