You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "askoa (via GitHub)" <gi...@apache.org> on 2023/02/04 19:34:34 UTC

[GitHub] [arrow-rs] askoa opened a new pull request, #3662: IPC support for run array.

askoa opened a new pull request, #3662:
URL: https://github.com/apache/arrow-rs/pull/3662

   # Which issue does this PR close?
   
   Part of #3520
   
   # Rationale for this change
   See issue description. 
   
   # What changes are included in this PR?
   
    - Change to flat buffer definition file (incorporate the change from arrow project).
    - Change to the code generated by flat buffer compiler.
    - Changes to the script used to run flat buffer compiler.
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102144006


##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,84 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support physical comparison.
+/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
+/// be time consuming as converting from logical index to physical index cannot be done
+/// in constant time. The current comparison compares the underlying physical arrays.
+pub(super) fn run_equal(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_start: usize,
+    rhs_start: usize,
+    len: usize,
+) -> bool {
+    if lhs_start != 0
+        || rhs_start != 0
+        || (lhs.len() != len && rhs.len() != len)

Review Comment:
   If I didn't misunderstand what intention it is.
   
   ```suggestion
           || (lhs.len() != len || rhs.len() != len)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102137467


##########
arrow-array/src/array/run_array.rs:
##########
@@ -204,12 +222,36 @@ impl<R: RunEndIndexType> RunArray<R> {
                 .unwrap()
         });
 
+        // Return early if all the logical indices cannot be converted to physical indices.
+        let largest_logical_index =
+            logical_indices[*ordered_indices.last().unwrap()].as_usize();
+        if largest_logical_index >= self.len() {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "Cannot convert all logical indices to physical indices. The logical index cannot be converted is {largest_logical_index}.",
+            )));
+        }
+
+        // Skip some physical indices based on offset.
+        let skip_value = if self.offset() > 0 {
+            Self::get_physical_index_from_run_ends_array(self.run_ends(), self.offset())

Review Comment:
   just to be more specific offset >= last value of run_ends array. `len` might also refer to length of slice so wanted to be more specific.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1101583537


##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {
+        if self.data.offset() == 0 && self.data.len() == Self::logical_len(&self.run_ends)
+        {
+            return Ok(self);
+        }
+        // The physical index of original run_ends array from which the `ArrayData`is sliced.
+        let start_physical_index = Self::get_physical_index_from_run_ends_array(
+            &self.run_ends,
+            self.data.offset(),
+        )
+        .ok_or_else(|| {

Review Comment:
   They can do, but they're unsafe so that wouldn't be unsound. I think `unwrap` is perfectly fine here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102126999


##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {

Review Comment:
   moved it to IPC writer and renamed to `get_zero_offset_run_array`. Please feel free to unresolve the conversation if that does not work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102779723


##########
arrow-array/src/array/run_array.rs:
##########
@@ -204,12 +220,30 @@ impl<R: RunEndIndexType> RunArray<R> {
                 .unwrap()
         });
 
+        // Return early if all the logical indices cannot be converted to physical indices.
+        let largest_logical_index =
+            logical_indices[*ordered_indices.last().unwrap()].as_usize();
+        if largest_logical_index >= self.len() {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "Cannot convert all logical indices to physical indices. The logical index cannot be converted is {largest_logical_index}.",
+            )));
+        }
+
+        // Skip some physical indices based on offset.
+        let skip_value = if self.offset() > 0 {
+            self.get_zero_offset_physical_index(self.offset()).unwrap()
+        } else {
+            0
+        };

Review Comment:
   ```suggestion
           let skip_value = self.get_physical_index(0).unwrap_or_default();
   ```
   Perhaps?



##########
arrow-array/src/array/run_array.rs:
##########
@@ -204,12 +222,36 @@ impl<R: RunEndIndexType> RunArray<R> {
                 .unwrap()
         });
 
+        // Return early if all the logical indices cannot be converted to physical indices.
+        let largest_logical_index =

Review Comment:
   > I removed the later validation.
   
   I'm not sure you did, but that is fine :smile: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on pull request #3662: feat: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#issuecomment-1420702970

   I am changing this to draft because I just realized that the IPC writer, written in this PR, for run encoded array does not support `offset > 0` or `length < original length` in `ArrayData`. 
   
   Supporting `offset > 0` or `length < original length` **is not a trivial task**. The way `ArrayData` is `sliced` for `RunArray` will have implications across all the functions that use `ArrayData`.  For e.g. the recently merged take kernel (#3622) does not work when `offset > 0`. 
   
   My proposal for this PR: The IPC writer will panic if the `offset > 0` or `length < original length` in the input `ArrayData`. Gradually add support for sliced `ArrayData` in subsequent PRs. Let me know what you think.
   
   cc: @tustvold @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102139045


##########
arrow-array/src/array/run_array.rs:
##########
@@ -175,6 +176,16 @@ impl<R: RunEndIndexType> RunArray<R> {
         Some(st)
     }
 
+    /// Returns index to the physical array for the given index to the logical array.

Review Comment:
   ```suggestion
       /// Returns index to the physical array for the given index to the logical array, after adjusted by offset.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1101542312


##########
arrow-ipc/src/writer.rs:
##########
@@ -1242,24 +1289,45 @@ fn write_array_data(
         }
     }
 
-    if !matches!(array_data.data_type(), DataType::Dictionary(_, _)) {
-        // recursively write out nested structures
-        for data_ref in array_data.child_data() {
-            // write the nested data (e.g list data)
-            offset = write_array_data(
-                data_ref,
-                buffers,
-                arrow_data,
-                nodes,
-                offset,
-                data_ref.len(),
-                data_ref.null_count(),
-                compression_codec,
-                write_options,
-            )?;
+    match array_data.data_type() {
+        DataType::Dictionary(_, _) => {}
+        DataType::RunEndEncoded(_, _) => {
+            // unslice the run encoded array.
+            let arr = unslice_run_array(array_data.clone())?;

Review Comment:
   In case of non-zero offset, a new `RunArray` is created in `into_non_sliced_array` whose default `offset` value is zero.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102149720


##########
arrow-ipc/src/writer.rs:
##########
@@ -967,12 +1074,15 @@ fn write_continuation<W: Write>(
 }
 
 /// In V4, null types have no validity bitmap
-/// In V5 and later, null and union types have no validity bitmap
+/// In V5 and later, null, union and run end encoded types have no validity bitmap

Review Comment:
   Although the result is correct, the doc reads like run end encoded type has validity bitmap before bitmap. 😅 
   
   ```suggestion
   /// In V5 and later, null and union types have no validity bitmap
   /// Run end encoded types has no validity bitmap.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102357725


##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,94 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support partial comparison.
+/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
+/// be time consuming as converting from logical index to physical index cannot be done

Review Comment:
   I am marking this as resolved without any changes. Feel free to unresolve it if you think otherwise.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#issuecomment-1425903250

   Benchmark runs are scheduled for baseline = 07e20639b7023fcc61c73f80a5bddf8715c2a06f and contender = 5b1821e0564f586f6f98e5c392a0a208890055df. 5b1821e0564f586f6f98e5c392a0a208890055df is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/fba77e998c0c4a6992ad5111ca308923...0b451d832075462a8896ea2e4b0ba84b/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/13a9d355c3c64a5a84432352c21e9438...9c04d2a702644220a7ed4e78e8398d29/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/8a32612aa20241afb0e7dc2ee97aafc3...2abe12f079ec44cbafb34696acedfc57/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/c152ba2c78c346919fedfa4a68e462ad...b36c046ca51d433293a90a7d3d25a0a7/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1101571590


##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {
+        if self.data.offset() == 0 && self.data.len() == Self::logical_len(&self.run_ends)
+        {
+            return Ok(self);
+        }
+        // The physical index of original run_ends array from which the `ArrayData`is sliced.
+        let start_physical_index = Self::get_physical_index_from_run_ends_array(
+            &self.run_ends,
+            self.data.offset(),
+        )
+        .ok_or_else(|| {

Review Comment:
   I thought the functions like `ArrayData::new_uncheked` and `ArrayBuilder::build_unchecked` can be used to create an invalid ArrayData. I don't have strong opinions here. Should we just do `unwrap`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1101558970


##########
arrow-ipc/src/writer.rs:
##########
@@ -1242,24 +1289,45 @@ fn write_array_data(
         }
     }
 
-    if !matches!(array_data.data_type(), DataType::Dictionary(_, _)) {
-        // recursively write out nested structures
-        for data_ref in array_data.child_data() {
-            // write the nested data (e.g list data)
-            offset = write_array_data(
-                data_ref,
-                buffers,
-                arrow_data,
-                nodes,
-                offset,
-                data_ref.len(),
-                data_ref.null_count(),
-                compression_codec,
-                write_options,
-            )?;
+    match array_data.data_type() {
+        DataType::Dictionary(_, _) => {}
+        DataType::RunEndEncoded(_, _) => {
+            // unslice the run encoded array.
+            let arr = unslice_run_array(array_data.clone())?;

Review Comment:
   Aah yes, I forgot that IPC doesn't have a concept of offsets, unlike the C Data interface



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1101895618


##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {

Review Comment:
   +1 It looks a bit weird to have it as public api. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1101398368


##########
arrow-array/src/array/run_array.rs:
##########
@@ -144,15 +144,16 @@ impl<R: RunEndIndexType> RunArray<R> {
         })
     }
 
-    /// Returns index to the physical array for the given index to the logical array.
-    /// Performs a binary search on the run_ends array for the input index.
     #[inline]
-    pub fn get_physical_index(&self, logical_index: usize) -> Option<usize> {
-        if logical_index >= self.len() {
+    fn get_physical_index_from_run_ends_array(
+        run_ends: &PrimitiveArray<R>,

Review Comment:
   This argument only ever appears to be given `self.run_ends`? Is it necessary?



##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {

Review Comment:
   I think I would prefer this live alongside the IPC implementation, as it is effectively an optimisation for the writer



##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {
+        if self.data.offset() == 0 && self.data.len() == Self::logical_len(&self.run_ends)
+        {
+            return Ok(self);
+        }
+        // The physical index of original run_ends array from which the `ArrayData`is sliced.
+        let start_physical_index = Self::get_physical_index_from_run_ends_array(
+            &self.run_ends,
+            self.data.offset(),
+        )
+        .ok_or_else(|| {
+            ArrowError::InvalidArgumentError(format!(
+                "Cannot convert the offset {} to physical index",
+                self.data.offset()
+            ))
+        })?;
+
+        // The logical length of original run_ends array until which the `ArrayData` is sliced.
+        let end_logical_index = self.data.offset() + self.data.len() - 1;
+        // The physical index of original run_ends array until which the `ArrayData`is sliced.
+        let end_physical_index =
+            Self::get_physical_index_from_run_ends_array(&self.run_ends, end_logical_index).ok_or_else(|| {
+                ArrowError::InvalidArgumentError(format!(
+                    "Cannot convert the `offset + len - 1` {end_logical_index} to physical index"
+                ))
+            })?;
+
+        let physical_length = end_physical_index - start_physical_index + 1;
+
+        // build new run_ends array by subtrating offset from run ends.
+        let new_run_ends: PrimitiveArray<R> = self
+            .run_ends
+            .values()
+            .iter()
+            .skip(start_physical_index)
+            .take(physical_length)
+            .map(|f| f.as_usize() - self.data.offset())
+            .map(|f| f.min(self.len()))
+            .map(R::Native::from_usize)
+            .collect();
+
+        // build new values by slicing physical indices.
+        let new_values = self
+            .values
+            .slice(start_physical_index, physical_length)
+            .into_data();
+
+        let builder = ArrayDataBuilder::new(self.data_type().clone())
+            .len(self.len())
+            .add_child_data(new_run_ends.into_data())
+            .add_child_data(new_values);
+        let array_data = builder.build()?;

Review Comment:
   This should probably use `build_unchecked` to avoid doing the validation again



##########
arrow-ipc/src/writer.rs:
##########
@@ -1242,24 +1289,45 @@ fn write_array_data(
         }
     }
 
-    if !matches!(array_data.data_type(), DataType::Dictionary(_, _)) {
-        // recursively write out nested structures
-        for data_ref in array_data.child_data() {
-            // write the nested data (e.g list data)
-            offset = write_array_data(
-                data_ref,
-                buffers,
-                arrow_data,
-                nodes,
-                offset,
-                data_ref.len(),
-                data_ref.null_count(),
-                compression_codec,
-                write_options,
-            )?;
+    match array_data.data_type() {
+        DataType::Dictionary(_, _) => {}
+        DataType::RunEndEncoded(_, _) => {
+            // unslice the run encoded array.
+            let arr = unslice_run_array(array_data.clone())?;

Review Comment:
   I can't see where we alter the offset that gets written for the `RunEndEncoded` array to be 0?



##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {
+        if self.data.offset() == 0 && self.data.len() == Self::logical_len(&self.run_ends)
+        {
+            return Ok(self);
+        }
+        // The physical index of original run_ends array from which the `ArrayData`is sliced.
+        let start_physical_index = Self::get_physical_index_from_run_ends_array(
+            &self.run_ends,
+            self.data.offset(),
+        )
+        .ok_or_else(|| {
+            ArrowError::InvalidArgumentError(format!(
+                "Cannot convert the offset {} to physical index",
+                self.data.offset()
+            ))
+        })?;
+
+        // The logical length of original run_ends array until which the `ArrayData` is sliced.
+        let end_logical_index = self.data.offset() + self.data.len() - 1;
+        // The physical index of original run_ends array until which the `ArrayData`is sliced.
+        let end_physical_index =
+            Self::get_physical_index_from_run_ends_array(&self.run_ends, end_logical_index).ok_or_else(|| {
+                ArrowError::InvalidArgumentError(format!(
+                    "Cannot convert the `offset + len - 1` {end_logical_index} to physical index"
+                ))
+            })?;
+
+        let physical_length = end_physical_index - start_physical_index + 1;
+
+        // build new run_ends array by subtrating offset from run ends.
+        let new_run_ends: PrimitiveArray<R> = self
+            .run_ends
+            .values()
+            .iter()
+            .skip(start_physical_index)
+            .take(physical_length)
+            .map(|f| f.as_usize() - self.data.offset())
+            .map(|f| f.min(self.len()))
+            .map(R::Native::from_usize)

Review Comment:
   I think this will convert overflow to `None` which is not correct



##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,94 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support partial comparison.
+/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
+/// be time consuming as converting from logical index to physical index cannot be done
+/// in constat time. The current comparison compares the underlying physical arrays.

Review Comment:
   ```suggestion
   /// in constant time. The current comparison compares the underlying physical arrays.
   ```



##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,94 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support partial comparison.
+/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
+/// be time consuming as converting from logical index to physical index cannot be done
+/// in constat time. The current comparison compares the underlying physical arrays.
+pub(super) fn run_equal(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_start: usize,
+    rhs_start: usize,
+    len: usize,
+) -> bool {
+    if lhs_start != 0
+        || rhs_start != 0
+        || (lhs.len() != len && rhs.len() != len)
+        || lhs.offset() > 0
+        || rhs.offset() > 0
+    {
+        unimplemented!("Partial comparison for run array not supported.")

Review Comment:
   ```suggestion
           unimplemented!("Logical comparison for run array not supported.")
   ```



##########
arrow-data/src/data.rs:
##########
@@ -1514,9 +1514,10 @@ impl ArrayData {
             Ok(())
         })?;
 
-        if prev_value.as_usize() != array_len {
+        if prev_value.as_usize() < (self.offset + self.len) {

Review Comment:
   :+1:



##########
arrow-array/src/array/run_array.rs:
##########
@@ -204,12 +222,36 @@ impl<R: RunEndIndexType> RunArray<R> {
                 .unwrap()
         });
 
+        // Return early if all the logical indices cannot be converted to physical indices.
+        let largest_logical_index =

Review Comment:
   Does this remove the need for the later check https://github.com/apache/arrow-rs/pull/3662/files#diff-5a6a759d325c1c9512125fcb8fc9dad89242f61e9fca5cffaa9297387cf94d07R269



##########
arrow-ipc/regen.sh:
##########
@@ -18,15 +18,13 @@
 
 DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 
-# Change to the toplevel Rust directory
-pushd $DIR/../../
+# Change to the toplevel `arrow-rs` directory

Review Comment:
   Thank you for updating this script, I've just been doing it manually :sweat_smile: 



##########
arrow-array/src/array/run_array.rs:
##########
@@ -552,6 +651,33 @@ mod tests {
         result
     }
 
+    fn compare_logical_and_physical_indices(

Review Comment:
   ```suggestion
       /// Asserts that `logical_array[logical_indices[*]] == physical_array[physical_indices[*]]`
       fn compare_logical_and_physical_indices(
   ```



##########
arrow-ipc/src/writer.rs:
##########
@@ -218,6 +218,30 @@ impl IpcDataGenerator {
                     )?;
                 }
             }
+            DataType::RunEndEncoded(run_ends, values) => {
+                if column.data().child_data().len() != 2 {
+                    return Err(ArrowError::InvalidArgumentError(format!(
+                        "The run encoded array should have exactly two child arrays. Found {}",
+                        column.data().child_data().len()
+                    )));
+                }
+                let run_ends_array = make_array(column.data().child_data()[0].clone());
+                let values_array = make_array(column.data().child_data()[1].clone());
+                self.encode_dictionaries(

Review Comment:
   I don't think the run ends can be dictionary encoded, so I think we could skip this for the first child



##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {
+        if self.data.offset() == 0 && self.data.len() == Self::logical_len(&self.run_ends)
+        {
+            return Ok(self);
+        }
+        // The physical index of original run_ends array from which the `ArrayData`is sliced.
+        let start_physical_index = Self::get_physical_index_from_run_ends_array(
+            &self.run_ends,
+            self.data.offset(),
+        )
+        .ok_or_else(|| {

Review Comment:
   FWIW these could panic, as the ArrayData should be valid



##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,94 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support partial comparison.
+/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
+/// be time consuming as converting from logical index to physical index cannot be done

Review Comment:
   Perhaps we could iterate over the physical indices in each array, and then use equal_range to compare the underlying values, yes this will be slow, but we do something similar for other nested types and it hasn't caused issues



##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {
+        if self.data.offset() == 0 && self.data.len() == Self::logical_len(&self.run_ends)
+        {
+            return Ok(self);
+        }
+        // The physical index of original run_ends array from which the `ArrayData`is sliced.
+        let start_physical_index = Self::get_physical_index_from_run_ends_array(
+            &self.run_ends,
+            self.data.offset(),
+        )
+        .ok_or_else(|| {
+            ArrowError::InvalidArgumentError(format!(
+                "Cannot convert the offset {} to physical index",
+                self.data.offset()
+            ))
+        })?;
+
+        // The logical length of original run_ends array until which the `ArrayData` is sliced.
+        let end_logical_index = self.data.offset() + self.data.len() - 1;
+        // The physical index of original run_ends array until which the `ArrayData`is sliced.
+        let end_physical_index =
+            Self::get_physical_index_from_run_ends_array(&self.run_ends, end_logical_index).ok_or_else(|| {
+                ArrowError::InvalidArgumentError(format!(
+                    "Cannot convert the `offset + len - 1` {end_logical_index} to physical index"
+                ))
+            })?;
+
+        let physical_length = end_physical_index - start_physical_index + 1;
+
+        // build new run_ends array by subtrating offset from run ends.
+        let new_run_ends: PrimitiveArray<R> = self
+            .run_ends
+            .values()
+            .iter()
+            .skip(start_physical_index)
+            .take(physical_length)
+            .map(|f| f.as_usize() - self.data.offset())
+            .map(|f| f.min(self.len()))

Review Comment:
   If I'm not mistaken this is to handle the final offset, perhaps we could special case this and use `PrimitiveBuilder`



##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {
+        if self.data.offset() == 0 && self.data.len() == Self::logical_len(&self.run_ends)
+        {
+            return Ok(self);
+        }
+        // The physical index of original run_ends array from which the `ArrayData`is sliced.
+        let start_physical_index = Self::get_physical_index_from_run_ends_array(
+            &self.run_ends,
+            self.data.offset(),
+        )
+        .ok_or_else(|| {
+            ArrowError::InvalidArgumentError(format!(
+                "Cannot convert the offset {} to physical index",
+                self.data.offset()
+            ))
+        })?;
+
+        // The logical length of original run_ends array until which the `ArrayData` is sliced.
+        let end_logical_index = self.data.offset() + self.data.len() - 1;
+        // The physical index of original run_ends array until which the `ArrayData`is sliced.
+        let end_physical_index =
+            Self::get_physical_index_from_run_ends_array(&self.run_ends, end_logical_index).ok_or_else(|| {
+                ArrowError::InvalidArgumentError(format!(
+                    "Cannot convert the `offset + len - 1` {end_logical_index} to physical index"
+                ))
+            })?;
+
+        let physical_length = end_physical_index - start_physical_index + 1;
+
+        // build new run_ends array by subtrating offset from run ends.
+        let new_run_ends: PrimitiveArray<R> = self
+            .run_ends
+            .values()
+            .iter()
+            .skip(start_physical_index)
+            .take(physical_length)
+            .map(|f| f.as_usize() - self.data.offset())
+            .map(|f| f.min(self.len()))
+            .map(R::Native::from_usize)
+            .collect();

Review Comment:
   FWIW using `BufferBuilder` directly will be significantly faster, as it doesn't need to handle nulls, LLVM is not that smart at optimising iterators



##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,94 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support partial comparison.
+/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
+/// be time consuming as converting from logical index to physical index cannot be done
+/// in constat time. The current comparison compares the underlying physical arrays.
+pub(super) fn run_equal(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_start: usize,
+    rhs_start: usize,
+    len: usize,
+) -> bool {
+    if lhs_start != 0
+        || rhs_start != 0
+        || (lhs.len() != len && rhs.len() != len)
+        || lhs.offset() > 0
+        || rhs.offset() > 0
+    {
+        unimplemented!("Partial comparison for run array not supported.")

Review Comment:
   FWIW I think it would be really good to support logical comparison, even if the first version is slow, as this is consistent with the other arrays



##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {

Review Comment:
   ```suggestion
       pub fn materialize_array_offset(self) -> Result<Self, ArrowError> {
   ```
   
   Or something, if we move this to be an implementation detail of IPC this effectively becomes an implementation detail



##########
arrow-ipc/src/reader.rs:
##########
@@ -194,6 +194,50 @@ fn create_array(
             };
             Arc::new(struct_array)
         }
+        RunEndEncoded(run_ends_field, values_field) => {
+            let run_node = nodes.get(node_index);
+            node_index += 1;
+
+            let run_ends_triple = create_array(
+                nodes,
+                run_ends_field,
+                data,
+                buffers,
+                dictionaries_by_id,
+                node_index,
+                buffer_index,
+                compression_codec,
+                metadata,
+            )?;
+            node_index = run_ends_triple.1;
+            buffer_index = run_ends_triple.2;
+
+            let values_triple = create_array(
+                nodes,
+                values_field,
+                data,
+                buffers,
+                dictionaries_by_id,
+                node_index,
+                buffer_index,
+                compression_codec,
+                metadata,
+            )?;
+            node_index = values_triple.1;
+            buffer_index = values_triple.2;
+
+            let run_array_length = run_node.length() as usize;
+            let run_array_null_count = run_node.null_count() as usize;
+            let data = ArrayData::builder(data_type.clone())
+                .len(run_array_length)
+                .null_count(run_array_null_count)
+                .offset(0)
+                .add_child_data(run_ends_triple.0.into_data())
+                .add_child_data(values_triple.0.into_data())
+                .build()?;

Review Comment:
   :+1: 



##########
arrow-array/src/array/run_array.rs:
##########
@@ -552,6 +651,33 @@ mod tests {
         result
     }
 
+    fn compare_logical_and_physical_indices(

Review Comment:
   I found it very hard to work out what this method was doing, some docstrings would go a long way



##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,94 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support partial comparison.

Review Comment:
   ```suggestion
   /// The current implementation of comparison of run array support physical comparison.
   ```



##########
arrow-array/src/array/run_array.rs:
##########
@@ -824,23 +998,82 @@ mod tests {
             assert_eq!(logical_indices.len(), physical_indices.len());
 
             // check value in logical index in the input_array matches physical index in typed_run_array
-            logical_indices
+            compare_logical_and_physical_indices(
+                &logical_indices,
+                &input_array,
+                &physical_indices,
+                physical_values_array,
+            );
+        }
+    }
+
+    #[test]
+    fn test_get_physical_indices_sliced() {
+        let total_len = 80;
+        let input_array = build_input_array(total_len);
+
+        // Encode the input_array to run array
+        let mut builder =
+            PrimitiveRunBuilder::<Int16Type, Int32Type>::with_capacity(input_array.len());
+        builder.extend(input_array.iter().copied());
+        let run_array = builder.finish();
+        let physical_values_array = as_primitive_array::<Int32Type>(run_array.values());
+
+        // test for all slice lengths.
+        for slice_len in 1..=total_len {
+            // create an array consisting of all the indices repeated twice and shuffled.
+            let mut logical_indices: Vec<u32> = (0_u32..(slice_len as u32)).collect();
+            // add same indices once more
+            logical_indices.append(&mut logical_indices.clone());
+            let mut rng = thread_rng();
+            logical_indices.shuffle(&mut rng);
+
+            // test for offset = 0 and slice length = slice_len
+            // slice the input array using which the run array was built.
+            let sliced_input_array: Vec<Option<i32>> =
+                input_array.iter().take(slice_len).copied().collect();
+
+            // slice the run array
+            let sliced_run_array: RunArray<Int16Type> =
+                run_array.slice(0, slice_len).into_data().into();
+
+            // Get physical indices.
+            let physical_indices = sliced_run_array
+                .get_physical_indices(&logical_indices)
+                .unwrap();
+
+            compare_logical_and_physical_indices(
+                &logical_indices,
+                &sliced_input_array,
+                &physical_indices,
+                physical_values_array,
+            );
+
+            // test for offset = total_len - slice_len and slice length = slice_len
+            // slice the input array using which the run array was built.
+            let sliced_input_array: Vec<Option<i32>> = input_array

Review Comment:
   ```suggestion
               let sliced_input_array = &input_array[total_len - slice_len..total_len]
   ```



##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,94 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support partial comparison.
+/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
+/// be time consuming as converting from logical index to physical index cannot be done
+/// in constat time. The current comparison compares the underlying physical arrays.
+pub(super) fn run_equal(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_start: usize,
+    rhs_start: usize,
+    len: usize,
+) -> bool {
+    if lhs_start != 0
+        || rhs_start != 0
+        || (lhs.len() != len && rhs.len() != len)
+        || lhs.offset() > 0
+        || rhs.offset() > 0
+    {
+        unimplemented!("Partial comparison for run array not supported.")
+    }
+
+    if lhs.len() != rhs.len() {
+        return false;
+    }
+
+    // This method does validation of the lhs array and rhs array required to do its
+    // function. This method does not ensure the validity of lhs and rhs array as run array.

Review Comment:
   We can safely assume validity, there shouldn't be a safe way to create a invalid ArrayData



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#issuecomment-1425911619

   > @tustvold @viirya Where can I see the benchmark results?
   
   I don't believe ursabot is setup to benchmark the Rust implementation, I think there was a ticket but I can't seem to find it...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102125859


##########
arrow-array/src/array/run_array.rs:
##########
@@ -204,12 +222,36 @@ impl<R: RunEndIndexType> RunArray<R> {
                 .unwrap()
         });
 
+        // Return early if all the logical indices cannot be converted to physical indices.
+        let largest_logical_index =

Review Comment:
   I removed the later validation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102142506


##########
arrow-data/src/data.rs:
##########
@@ -1514,9 +1514,10 @@ impl ArrayData {
             Ok(())
         })?;
 
-        if prev_value.as_usize() != array_len {
+        if prev_value.as_usize() < (self.offset + self.len) {
             return Err(ArrowError::InvalidArgumentError(format!(
-                "The length of array does not match the last value in the run_ends array. The last value of run_ends array is {prev_value} and length of array is {array_len}."
+                "The offset + length of array should be lte last value in the run_ends array. The last value of run_ends array is {prev_value} and offset + length of array is {}.",

Review Comment:
   ```suggestion
                   "The offset + length of array should be less or equal last value in the run_ends array. The last value of run_ends array is {prev_value} and offset + length of array is {}.",
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1101519614


##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,94 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support partial comparison.
+/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
+/// be time consuming as converting from logical index to physical index cannot be done

Review Comment:
   May be in a subsequent PR? I was trying to do logical comparison and noticed that the crate does not depend on `arrow-array`. So iterating over `run_ends` array values means some of the `PrimitiveArray` work has to be duplicated here or we need to add `arrow-array` as a dependent crate. Either way I thought might be suitable for another PR as this PR is more about IPC support.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1101523281


##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {
+        if self.data.offset() == 0 && self.data.len() == Self::logical_len(&self.run_ends)
+        {
+            return Ok(self);
+        }
+        // The physical index of original run_ends array from which the `ArrayData`is sliced.
+        let start_physical_index = Self::get_physical_index_from_run_ends_array(
+            &self.run_ends,
+            self.data.offset(),
+        )
+        .ok_or_else(|| {
+            ArrowError::InvalidArgumentError(format!(
+                "Cannot convert the offset {} to physical index",
+                self.data.offset()
+            ))
+        })?;
+
+        // The logical length of original run_ends array until which the `ArrayData` is sliced.
+        let end_logical_index = self.data.offset() + self.data.len() - 1;
+        // The physical index of original run_ends array until which the `ArrayData`is sliced.
+        let end_physical_index =
+            Self::get_physical_index_from_run_ends_array(&self.run_ends, end_logical_index).ok_or_else(|| {
+                ArrowError::InvalidArgumentError(format!(
+                    "Cannot convert the `offset + len - 1` {end_logical_index} to physical index"
+                ))
+            })?;
+
+        let physical_length = end_physical_index - start_physical_index + 1;
+
+        // build new run_ends array by subtrating offset from run ends.
+        let new_run_ends: PrimitiveArray<R> = self
+            .run_ends
+            .values()
+            .iter()
+            .skip(start_physical_index)
+            .take(physical_length)
+            .map(|f| f.as_usize() - self.data.offset())
+            .map(|f| f.min(self.len()))
+            .map(R::Native::from_usize)

Review Comment:
   It shouldn't. The `usize` is created from `R::Native` couple of lines above and hence cannot overflow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1101535068


##########
arrow-array/src/array/run_array.rs:
##########
@@ -204,12 +222,36 @@ impl<R: RunEndIndexType> RunArray<R> {
                 .unwrap()
         });
 
+        // Return early if all the logical indices cannot be converted to physical indices.
+        let largest_logical_index =

Review Comment:
   Yes, I just left it there to catch any unforeseen issues. I can remove it if you have strong opinion on it. But I don't see any issue retaining it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102126041


##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {

Review Comment:
   Moved it to writer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102350756


##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,84 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support physical comparison.
+/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
+/// be time consuming as converting from logical index to physical index cannot be done
+/// in constant time. The current comparison compares the underlying physical arrays.
+pub(super) fn run_equal(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_start: usize,
+    rhs_start: usize,
+    len: usize,
+) -> bool {
+    if lhs_start != 0
+        || rhs_start != 0
+        || (lhs.len() != len && rhs.len() != len)

Review Comment:
   In order for not to panic, the length should match at least one of the array. The length mismatch will cause a `false` return value rather than panic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1101508987


##########
arrow-array/src/array/run_array.rs:
##########
@@ -144,15 +144,16 @@ impl<R: RunEndIndexType> RunArray<R> {
         })
     }
 
-    /// Returns index to the physical array for the given index to the logical array.
-    /// Performs a binary search on the run_ends array for the input index.
     #[inline]
-    pub fn get_physical_index(&self, logical_index: usize) -> Option<usize> {
-        if logical_index >= self.len() {
+    fn get_physical_index_from_run_ends_array(
+        run_ends: &PrimitiveArray<R>,

Review Comment:
   My thought process was, `self` should be based on slice. The function acts on `run_ends` array irrespective of `offset` and `length` in the `ArrayData`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102356502


##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,84 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support physical comparison.
+/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
+/// be time consuming as converting from logical index to physical index cannot be done
+/// in constant time. The current comparison compares the underlying physical arrays.
+pub(super) fn run_equal(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_start: usize,
+    rhs_start: usize,
+    len: usize,
+) -> bool {
+    if lhs_start != 0
+        || rhs_start != 0
+        || (lhs.len() != len && rhs.len() != len)

Review Comment:
   I am not changing anything. The validation you are looking for happens couple of lines below the line highlighted in this suggestion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102128006


##########
arrow-array/src/array/run_array.rs:
##########
@@ -204,12 +222,36 @@ impl<R: RunEndIndexType> RunArray<R> {
                 .unwrap()
         });
 
+        // Return early if all the logical indices cannot be converted to physical indices.
+        let largest_logical_index =
+            logical_indices[*ordered_indices.last().unwrap()].as_usize();
+        if largest_logical_index >= self.len() {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "Cannot convert all logical indices to physical indices. The logical index cannot be converted is {largest_logical_index}.",
+            )));
+        }
+
+        // Skip some physical indices based on offset.
+        let skip_value = if self.offset() > 0 {
+            Self::get_physical_index_from_run_ends_array(self.run_ends(), self.offset())

Review Comment:
   Yes. If the `ArrayData` is validated then there should not be any issues.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102132630


##########
arrow-ipc/regen.sh:
##########
@@ -18,15 +18,13 @@
 
 DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 
-# Change to the toplevel Rust directory
-pushd $DIR/../../
+# Change to the toplevel `arrow-rs` directory

Review Comment:
   no problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#issuecomment-1425907692

   > Benchmark runs are scheduled for baseline = https://github.com/apache/arrow-rs/commit/07e20639b7023fcc61c73f80a5bddf8715c2a06f and contender = https://github.com/apache/arrow-rs/commit/5b1821e0564f586f6f98e5c392a0a208890055df. https://github.com/apache/arrow-rs/commit/5b1821e0564f586f6f98e5c392a0a208890055df is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   
   
   @tustvold @viirya  Where can I see the benchmark results?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1101527211


##########
arrow-array/src/array/run_array.rs:
##########
@@ -233,6 +275,63 @@ impl<R: RunEndIndexType> RunArray<R> {
         }
         Ok(physical_indices)
     }
+
+    /// Returns a `RunArray` with zero offset and length matching the last value
+    /// in run_ends array.
+    pub fn into_non_sliced_array(self) -> Result<Self, ArrowError> {
+        if self.data.offset() == 0 && self.data.len() == Self::logical_len(&self.run_ends)
+        {
+            return Ok(self);
+        }
+        // The physical index of original run_ends array from which the `ArrayData`is sliced.
+        let start_physical_index = Self::get_physical_index_from_run_ends_array(
+            &self.run_ends,
+            self.data.offset(),
+        )
+        .ok_or_else(|| {
+            ArrowError::InvalidArgumentError(format!(
+                "Cannot convert the offset {} to physical index",
+                self.data.offset()
+            ))
+        })?;
+
+        // The logical length of original run_ends array until which the `ArrayData` is sliced.
+        let end_logical_index = self.data.offset() + self.data.len() - 1;
+        // The physical index of original run_ends array until which the `ArrayData`is sliced.
+        let end_physical_index =
+            Self::get_physical_index_from_run_ends_array(&self.run_ends, end_logical_index).ok_or_else(|| {
+                ArrowError::InvalidArgumentError(format!(
+                    "Cannot convert the `offset + len - 1` {end_logical_index} to physical index"
+                ))
+            })?;
+
+        let physical_length = end_physical_index - start_physical_index + 1;
+
+        // build new run_ends array by subtrating offset from run ends.
+        let new_run_ends: PrimitiveArray<R> = self
+            .run_ends
+            .values()
+            .iter()
+            .skip(start_physical_index)
+            .take(physical_length)
+            .map(|f| f.as_usize() - self.data.offset())
+            .map(|f| f.min(self.len()))

Review Comment:
   Yes. I thought the current code was more readable than using a Builder. But it make sense not to do min operation on all values. I'll use a builder.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102787638


##########
arrow-array/src/array/run_array.rs:
##########
@@ -204,12 +220,30 @@ impl<R: RunEndIndexType> RunArray<R> {
                 .unwrap()
         });
 
+        // Return early if all the logical indices cannot be converted to physical indices.
+        let largest_logical_index =
+            logical_indices[*ordered_indices.last().unwrap()].as_usize();
+        if largest_logical_index >= self.len() {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "Cannot convert all logical indices to physical indices. The logical index cannot be converted is {largest_logical_index}.",
+            )));
+        }
+
+        // Skip some physical indices based on offset.
+        let skip_value = if self.offset() > 0 {
+            self.get_zero_offset_physical_index(self.offset()).unwrap()
+        } else {
+            0
+        };

Review Comment:
   No. We should get a valid response back. Panic or error is the better option. I used to have error, but based on your earlier feedback changed it to panic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102121106


##########
arrow-array/src/array/run_array.rs:
##########
@@ -204,12 +222,36 @@ impl<R: RunEndIndexType> RunArray<R> {
                 .unwrap()
         });
 
+        // Return early if all the logical indices cannot be converted to physical indices.
+        let largest_logical_index =
+            logical_indices[*ordered_indices.last().unwrap()].as_usize();
+        if largest_logical_index >= self.len() {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "Cannot convert all logical indices to physical indices. The logical index cannot be converted is {largest_logical_index}.",
+            )));
+        }
+
+        // Skip some physical indices based on offset.
+        let skip_value = if self.offset() > 0 {
+            Self::get_physical_index_from_run_ends_array(self.run_ends(), self.offset())

Review Comment:
   When this could happen? `offset` larger than `len`? Is it possible?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "askoa (via GitHub)" <gi...@apache.org>.
askoa commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102132415


##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,94 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support partial comparison.
+/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
+/// be time consuming as converting from logical index to physical index cannot be done
+/// in constat time. The current comparison compares the underlying physical arrays.
+pub(super) fn run_equal(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_start: usize,
+    rhs_start: usize,
+    len: usize,
+) -> bool {
+    if lhs_start != 0
+        || rhs_start != 0
+        || (lhs.len() != len && rhs.len() != len)
+        || lhs.offset() > 0
+        || rhs.offset() > 0
+    {
+        unimplemented!("Partial comparison for run array not supported.")

Review Comment:
   Explained my reason in another comment. Will mark this as resolved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3662: feat + fix: IPC support for run encoded array.

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3662:
URL: https://github.com/apache/arrow-rs/pull/3662#discussion_r1102144006


##########
arrow-data/src/equal/run.rs:
##########
@@ -0,0 +1,84 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::ArrayData;
+
+use super::equal_range;
+
+/// The current implementation of comparison of run array support physical comparison.
+/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
+/// be time consuming as converting from logical index to physical index cannot be done
+/// in constant time. The current comparison compares the underlying physical arrays.
+pub(super) fn run_equal(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_start: usize,
+    rhs_start: usize,
+    len: usize,
+) -> bool {
+    if lhs_start != 0
+        || rhs_start != 0
+        || (lhs.len() != len && rhs.len() != len)

Review Comment:
   ```suggestion
           || (lhs.len() != len || rhs.len() != len)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org