You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/21 21:42:38 UTC

[GitHub] [arrow-rs] viirya opened a new pull request #1469: Issue 67

viirya opened a new pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469


   # Which issue does this PR close?
   
   <!---
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #67.
   
   # Rationale for this change
    
    <!---
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   # What changes are included in this PR?
   
   <!---
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   
   <!---
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r835960131



##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(

Review comment:
       Hmm, even type_ids are different, isn't it still possible that the union arrays are equal?
   
   For example, the fields are int, int, float, string. If lhs type_ids is 1, but rhs type_id is 0, they are both int. We don't know if the actual element at child arrays are equal or not.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r839292775



##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(
+    lhs_fields: &[Field],
+    rhs_fields: &[Field],
+    lhs_type_ids: &[i8],
+    rhs_type_ids: &[i8],
+) -> bool {
+    let lhs_slots_types = lhs_type_ids
+        .iter()
+        .map(|type_id| lhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    let rhs_slots_types = rhs_type_ids
+        .iter()
+        .map(|type_id| rhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    lhs_slots_types.zip(rhs_slots_types).all(|(l, r)| l == r)
+}
+
+fn equal_dense(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_type_ids: &[i8],
+    rhs_type_ids: &[i8],
+    lhs_offsets: &[i32],
+    rhs_offsets: &[i32],
+) -> bool {
+    let offsets = lhs_offsets.iter().zip(rhs_offsets.iter());
+
+    lhs_type_ids
+        .iter()
+        .zip(rhs_type_ids.iter())
+        .zip(offsets)
+        .all(|((l_type_id, r_type_id), (l_offset, r_offset))| {
+            let lhs_values = &lhs.child_data()[*l_type_id as usize];
+            let rhs_values = &rhs.child_data()[*r_type_id as usize];
+
+            equal_values(
+                lhs_values,
+                rhs_values,
+                None,
+                None,
+                *l_offset as usize,
+                *r_offset as usize,
+                1,
+            )
+        })
+}
+
+#[allow(clippy::too_many_arguments)]
+fn equal_sparse(

Review comment:
       `equal_sparse` was simplified to compare corresponding `child_data` at given positions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r839296915



##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(

Review comment:
       `equal_types` was removed now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#issuecomment-1079903468


   I will try and review this PR more carefully tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r836182203



##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(

Review comment:
       In the case of two fields with the same type, I believe these must still have distinct field names and therefore there is a logical difference between values stored in one vs the other. I'm not sure about this though, possibly worth an upstream clarification 🤔
   
   On the case of different field ordering the arrays can never compare equal as they have different data types.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r837807789



##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(
+    lhs_fields: &[Field],
+    rhs_fields: &[Field],
+    lhs_type_ids: &[i8],
+    rhs_type_ids: &[i8],
+) -> bool {
+    let lhs_slots_types = lhs_type_ids
+        .iter()
+        .map(|type_id| lhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    let rhs_slots_types = rhs_type_ids
+        .iter()
+        .map(|type_id| rhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    lhs_slots_types.zip(rhs_slots_types).all(|(l, r)| l == r)
+}
+
+fn equal_dense(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_type_ids: &[i8],
+    rhs_type_ids: &[i8],
+    lhs_offsets: &[i32],
+    rhs_offsets: &[i32],
+) -> bool {
+    let offsets = lhs_offsets.iter().zip(rhs_offsets.iter());
+
+    lhs_type_ids
+        .iter()
+        .zip(rhs_type_ids.iter())
+        .zip(offsets)
+        .all(|((l_type_id, r_type_id), (l_offset, r_offset))| {
+            let lhs_values = &lhs.child_data()[*l_type_id as usize];
+            let rhs_values = &rhs.child_data()[*r_type_id as usize];
+
+            equal_values(
+                lhs_values,
+                rhs_values,
+                None,
+                None,
+                *l_offset as usize,
+                *r_offset as usize,
+                1,
+            )
+        })
+}
+
+#[allow(clippy::too_many_arguments)]
+fn equal_sparse(

Review comment:
       I suppose it really does come down to "what does it mean for two `UnionArray`s to be equal"?
   
   
   Can it ever be true that `UnionArray(Int, Utf8)` be equal to `UnionArray(Utf8, Int)`?
   
   Given my reading of the cpp `ArrayEquals` code in https://github.com/apache/arrow/blob/master/cpp/src/arrow/compare.cc it seems to me that two arrays are only ever "equal" if their datatypes are (exactly) equal. 
   
   Maybe we could follow the cpp implementation -- both for consistency as well as for simplicity?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb merged pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r835972012



##########
File path: arrow/src/array/equal/utils.rs
##########
@@ -161,8 +168,35 @@ pub(super) fn child_logical_null_buffer(
             });
             Some(buffer.into())
         }
-        DataType::Union(_, _) => {
-            unimplemented!("Logical equality not yet implemented for union arrays")
+        DataType::Union(_, mode) => {
+            match mode {
+                UnionMode::Sparse => {
+                    // See the logic of `DataType::Struct` in `child_logical_null_buffer`.

Review comment:
       Made it as a function now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r835960131



##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(

Review comment:
       Hmm, even type_ids are different, isn't it still possible that the union arrays are equal?
   
   For example, the fields are int, int, float, string. If lhs type_ids is 1, but rhs type_id is 0, they are both int. We don't know if the actual element at child arrays are equal or not.
   
   So here I compare if the type_ids are the same type or not, not its values.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r839300380



##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(
+    lhs_fields: &[Field],
+    rhs_fields: &[Field],
+    lhs_type_ids: &[i8],
+    rhs_type_ids: &[i8],
+) -> bool {
+    let lhs_slots_types = lhs_type_ids
+        .iter()
+        .map(|type_id| lhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    let rhs_slots_types = rhs_type_ids
+        .iter()
+        .map(|type_id| rhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    lhs_slots_types.zip(rhs_slots_types).all(|(l, r)| l == r)
+}
+
+fn equal_dense(

Review comment:
       Even type_ids equality is established, lhs and rhs start positions might be different. So we cannot simply compare offsets arrays and child arrays for dense case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#issuecomment-1084958489


   Thansk @alamb @tustvold 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r834430189



##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(
+    lhs_fields: &[Field],
+    rhs_fields: &[Field],
+    lhs_type_ids: &[i8],
+    rhs_type_ids: &[i8],
+) -> bool {
+    let lhs_slots_types = lhs_type_ids
+        .iter()
+        .map(|type_id| lhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    let rhs_slots_types = rhs_type_ids
+        .iter()
+        .map(|type_id| rhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    lhs_slots_types.zip(rhs_slots_types).all(|(l, r)| l == r)
+}
+
+fn equal_dense(

Review comment:
       Much like equal_sparse below, by the time we are calling this method we have already established equality of the type_ids, I therefore wonder if we can simply compute equality of the offsets arrays, followed by computing equality of the underlying child arrays.
   
   This appears to be similar to what is done for list_equal :thinking: 

##########
File path: arrow/src/array/equal/utils.rs
##########
@@ -161,8 +168,35 @@ pub(super) fn child_logical_null_buffer(
             });
             Some(buffer.into())
         }
-        DataType::Union(_, _) => {
-            unimplemented!("Logical equality not yet implemented for union arrays")
+        DataType::Union(_, mode) => {
+            match mode {
+                UnionMode::Sparse => {
+                    // See the logic of `DataType::Struct` in `child_logical_null_buffer`.

Review comment:
       Perhaps this could be broken out into a function? I also wonder if it might be possible to implement more efficiently by truncating the longer buffer :thinking: 

##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(

Review comment:
       In order to be equal the two arrays must have the same schema and therefore `lhs_fields == rhs_fields` I think therefore this could simply just compare the type_id arrays for equality directly?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r835961099



##########
File path: arrow/src/array/equal/utils.rs
##########
@@ -66,7 +66,14 @@ pub(super) fn equal_nulls(
 
 #[inline]
 pub(super) fn base_equal(lhs: &ArrayData, rhs: &ArrayData) -> bool {
-    lhs.data_type() == rhs.data_type() && lhs.len() == rhs.len()
+    let equal_type = match (lhs.data_type(), rhs.data_type()) {
+        (DataType::Union(l_fields, l_mode), DataType::Union(r_fields, r_mode)) => {
+            // Defer field datatype check to `union_equal`

Review comment:
       Because even the lhs fields and rhs fields are different, the two union arrays are still possible equal. We cannot use fields to decide if two union arrays are equal or not. Fields and type_ids are co-related.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#issuecomment-1074462490


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1469](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c4ee09a) into [master](https://codecov.io/gh/apache/arrow-rs/commit/c5442cf2fd8f046e6ad75d2d5c7efb2899dd654d?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c5442cf) will **increase** coverage by `0.06%`.
   > The diff coverage is `92.95%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1469      +/-   ##
   ==========================================
   + Coverage   82.68%   82.75%   +0.06%     
   ==========================================
     Files         188      190       +2     
     Lines       54361    54702     +341     
   ==========================================
   + Hits        44951    45270     +319     
   - Misses       9410     9432      +22     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2J1aWxkZXIucnM=) | `86.68% <50.00%> (-0.05%)` | :arrow_down: |
   | [arrow/src/array/equal/utils.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL3V0aWxzLnJz) | `74.45% <50.00%> (-3.24%)` | :arrow_down: |
   | [arrow/src/array/equal/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL21vZC5ycw==) | `93.98% <98.78%> (+0.65%)` | :arrow_up: |
   | [arrow/src/array/equal/union.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL3VuaW9uLnJz) | `100.00% <100.00%> (ø)` | |
   | [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.40% <0.00%> (-0.40%)` | :arrow_down: |
   | [arrow/src/datatypes/field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9maWVsZC5ycw==) | `53.79% <0.00%> (-0.31%)` | :arrow_down: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `86.46% <0.00%> (-0.12%)` | :arrow_down: |
   | [arrow/src/compute/kernels/filter.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9maWx0ZXIucnM=) | `88.00% <0.00%> (ø)` | |
   | [arrow/src/compute/kernels/length.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9sZW5ndGgucnM=) | `100.00% <0.00%> (ø)` | |
   | [arrow/src/ffi\_stream.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2ZmaV9zdHJlYW0ucnM=) | `79.03% <0.00%> (ø)` | |
   | ... and [8 more](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [c5442cf...c4ee09a](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r835960661



##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(

Review comment:
       And two union arrays could be equal even if two fields/type_ids are different. For example, we can change the field position and type_id, but the resulting union array is still the same.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#issuecomment-1074462490


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1469](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (11564d0) into [master](https://codecov.io/gh/apache/arrow-rs/commit/c5442cf2fd8f046e6ad75d2d5c7efb2899dd654d?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c5442cf) will **increase** coverage by `0.02%`.
   > The diff coverage is `93.37%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1469      +/-   ##
   ==========================================
   + Coverage   82.68%   82.71%   +0.02%     
   ==========================================
     Files         188      189       +1     
     Lines       54361    54511     +150     
   ==========================================
   + Hits        44951    45091     +140     
   - Misses       9410     9420      +10     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2J1aWxkZXIucnM=) | `86.68% <50.00%> (-0.05%)` | :arrow_down: |
   | [arrow/src/array/equal/utils.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL3V0aWxzLnJz) | `74.45% <50.00%> (-3.24%)` | :arrow_down: |
   | [arrow/src/array/equal/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL21vZC5ycw==) | `93.98% <98.78%> (+0.65%)` | :arrow_up: |
   | [arrow/src/array/equal/union.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL3VuaW9uLnJz) | `100.00% <100.00%> (ø)` | |
   | [arrow/src/datatypes/field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9maWVsZC5ycw==) | `53.79% <0.00%> (-0.31%)` | :arrow_down: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `86.46% <0.00%> (-0.12%)` | :arrow_down: |
   | [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `93.56% <0.00%> (+0.18%)` | :arrow_up: |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.43% <0.00%> (+0.22%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [c5442cf...11564d0](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r836182203



##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(

Review comment:
       In the case of two fields with the same type, I believe these must still have field names and therefore there is a logical difference between values stored in one vs the other. I'm not sure about this though, possibly worth an upstream clarification 🤔
   
   On the case of different field ordering the arrays can never compare equal as they have different data types.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r836184489



##########
File path: arrow/src/array/equal/utils.rs
##########
@@ -66,7 +66,14 @@ pub(super) fn equal_nulls(
 
 #[inline]
 pub(super) fn base_equal(lhs: &ArrayData, rhs: &ArrayData) -> bool {
-    lhs.data_type() == rhs.data_type() && lhs.len() == rhs.len()
+    let equal_type = match (lhs.data_type(), rhs.data_type()) {
+        (DataType::Union(l_fields, l_mode), DataType::Union(r_fields, r_mode)) => {
+            // Defer field datatype check to `union_equal`

Review comment:
       The spec describes a union as an ordered sequence of types with names. My interpretation of this is that the fields are order-sensitive and must be identical for the types to be the same... 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#issuecomment-1079968055


   Seems I missed some comments, I will address them soon. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r835960889



##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(
+    lhs_fields: &[Field],
+    rhs_fields: &[Field],
+    lhs_type_ids: &[i8],
+    rhs_type_ids: &[i8],
+) -> bool {
+    let lhs_slots_types = lhs_type_ids
+        .iter()
+        .map(|type_id| lhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    let rhs_slots_types = rhs_type_ids
+        .iter()
+        .map(|type_id| rhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    lhs_slots_types.zip(rhs_slots_types).all(|(l, r)| l == r)
+}
+
+fn equal_dense(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_type_ids: &[i8],
+    rhs_type_ids: &[i8],
+    lhs_offsets: &[i32],
+    rhs_offsets: &[i32],
+) -> bool {
+    let offsets = lhs_offsets.iter().zip(rhs_offsets.iter());
+
+    lhs_type_ids
+        .iter()
+        .zip(rhs_type_ids.iter())
+        .zip(offsets)
+        .all(|((l_type_id, r_type_id), (l_offset, r_offset))| {
+            let lhs_values = &lhs.child_data()[*l_type_id as usize];
+            let rhs_values = &rhs.child_data()[*r_type_id as usize];
+
+            equal_values(
+                lhs_values,
+                rhs_values,
+                None,
+                None,
+                *l_offset as usize,
+                *r_offset as usize,
+                1,
+            )
+        })
+}
+
+#[allow(clippy::too_many_arguments)]
+fn equal_sparse(

Review comment:
       This is related to above question. As field and type_id can be changed in position, but the resulting union array is still the same, we cannot directly compare them, but can only compare child array corresponding to type_id.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r834430646



##########
File path: arrow/src/array/equal/utils.rs
##########
@@ -66,7 +66,14 @@ pub(super) fn equal_nulls(
 
 #[inline]
 pub(super) fn base_equal(lhs: &ArrayData, rhs: &ArrayData) -> bool {
-    lhs.data_type() == rhs.data_type() && lhs.len() == rhs.len()
+    let equal_type = match (lhs.data_type(), rhs.data_type()) {
+        (DataType::Union(l_fields, l_mode), DataType::Union(r_fields, r_mode)) => {
+            // Defer field datatype check to `union_equal`

Review comment:
       Why?

##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(
+    lhs_fields: &[Field],
+    rhs_fields: &[Field],
+    lhs_type_ids: &[i8],
+    rhs_type_ids: &[i8],
+) -> bool {
+    let lhs_slots_types = lhs_type_ids
+        .iter()
+        .map(|type_id| lhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    let rhs_slots_types = rhs_type_ids
+        .iter()
+        .map(|type_id| rhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    lhs_slots_types.zip(rhs_slots_types).all(|(l, r)| l == r)
+}
+
+fn equal_dense(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_type_ids: &[i8],
+    rhs_type_ids: &[i8],
+    lhs_offsets: &[i32],
+    rhs_offsets: &[i32],
+) -> bool {
+    let offsets = lhs_offsets.iter().zip(rhs_offsets.iter());
+
+    lhs_type_ids
+        .iter()
+        .zip(rhs_type_ids.iter())
+        .zip(offsets)
+        .all(|((l_type_id, r_type_id), (l_offset, r_offset))| {
+            let lhs_values = &lhs.child_data()[*l_type_id as usize];
+            let rhs_values = &rhs.child_data()[*r_type_id as usize];
+
+            equal_values(
+                lhs_values,
+                rhs_values,
+                None,
+                None,
+                *l_offset as usize,
+                *r_offset as usize,
+                1,
+            )
+        })
+}
+
+#[allow(clippy::too_many_arguments)]
+fn equal_sparse(

Review comment:
       By the time we are calling this method we have established that 
   
   * the types are equal
   * the null buffers are equal
   * the type ids arrays are equal
   
   As such I'm surprised this can't just do a standard equality comparison of the child data, which will take into null buffers of the children




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#issuecomment-1074462490


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1469](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d896a9b) into [master](https://codecov.io/gh/apache/arrow-rs/commit/b882d4599d29197b3d41a91fac29cdb8702e8788?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b882d45) will **increase** coverage by `0.04%`.
   > The diff coverage is `93.54%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1469      +/-   ##
   ==========================================
   + Coverage   82.69%   82.73%   +0.04%     
   ==========================================
     Files         187      188       +1     
     Lines       54144    54322     +178     
   ==========================================
   + Hits        44773    44946     +173     
   - Misses       9371     9376       +5     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2J1aWxkZXIucnM=) | `86.68% <50.00%> (-0.05%)` | :arrow_down: |
   | [arrow/src/array/equal/utils.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL3V0aWxzLnJz) | `70.94% <55.55%> (-3.06%)` | :arrow_down: |
   | [arrow/src/array/equal/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL21vZC5ycw==) | `93.98% <98.78%> (+0.65%)` | :arrow_up: |
   | [arrow/src/array/equal/union.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL3VuaW9uLnJz) | `100.00% <100.00%> (ø)` | |
   | [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.40% <0.00%> (-0.40%)` | :arrow_down: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `86.46% <0.00%> (-0.12%)` | :arrow_down: |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.43% <0.00%> (+0.22%)` | :arrow_up: |
   | [arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2FycmF5LnJz) | `85.67% <0.00%> (+0.26%)` | :arrow_up: |
   | [arrow/src/datatypes/field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9maWVsZC5ycw==) | `54.10% <0.00%> (+0.30%)` | :arrow_up: |
   | [arrow/src/ffi.rs](https://codecov.io/gh/apache/arrow-rs/pull/1469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2ZmaS5ycw==) | `87.50% <0.00%> (+1.71%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [b882d45...d896a9b](https://codecov.io/gh/apache/arrow-rs/pull/1469?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r837849833



##########
File path: arrow/src/array/equal/union.rs
##########
@@ -0,0 +1,171 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::datatypes::Field;
+use crate::{
+    array::ArrayData, buffer::Buffer, datatypes::DataType, datatypes::UnionMode,
+};
+
+use super::{
+    equal_range, equal_values, utils::child_logical_null_buffer, utils::equal_nulls,
+};
+
+// Checks if corresponding slots in two UnionArrays are same data types
+fn equal_types(
+    lhs_fields: &[Field],
+    rhs_fields: &[Field],
+    lhs_type_ids: &[i8],
+    rhs_type_ids: &[i8],
+) -> bool {
+    let lhs_slots_types = lhs_type_ids
+        .iter()
+        .map(|type_id| lhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    let rhs_slots_types = rhs_type_ids
+        .iter()
+        .map(|type_id| rhs_fields.get(*type_id as usize).unwrap().data_type());
+
+    lhs_slots_types.zip(rhs_slots_types).all(|(l, r)| l == r)
+}
+
+fn equal_dense(
+    lhs: &ArrayData,
+    rhs: &ArrayData,
+    lhs_type_ids: &[i8],
+    rhs_type_ids: &[i8],
+    lhs_offsets: &[i32],
+    rhs_offsets: &[i32],
+) -> bool {
+    let offsets = lhs_offsets.iter().zip(rhs_offsets.iter());
+
+    lhs_type_ids
+        .iter()
+        .zip(rhs_type_ids.iter())
+        .zip(offsets)
+        .all(|((l_type_id, r_type_id), (l_offset, r_offset))| {
+            let lhs_values = &lhs.child_data()[*l_type_id as usize];
+            let rhs_values = &rhs.child_data()[*r_type_id as usize];
+
+            equal_values(
+                lhs_values,
+                rhs_values,
+                None,
+                None,
+                *l_offset as usize,
+                *r_offset as usize,
+                1,
+            )
+        })
+}
+
+#[allow(clippy::too_many_arguments)]
+fn equal_sparse(

Review comment:
       Ah, if it is the case, we can simplify the comparison here. What I have in mind and implemented here can be thought as logically equal. That means corresponding slots of two union arrays are equal, but their physical layout may be different -- type_ids and fields can be different separately.
   
   If we only want to compare the physical layout, then fields and type_ids are required to be exactly equal.
   
   Thanks for pointing out the cpp implementation. We should follow it, I agree.
   
   I will revise this change for that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#discussion_r839296459



##########
File path: arrow/src/array/equal/utils.rs
##########
@@ -66,7 +66,14 @@ pub(super) fn equal_nulls(
 
 #[inline]
 pub(super) fn base_equal(lhs: &ArrayData, rhs: &ArrayData) -> bool {
-    lhs.data_type() == rhs.data_type() && lhs.len() == rhs.len()
+    let equal_type = match (lhs.data_type(), rhs.data_type()) {
+        (DataType::Union(l_fields, l_mode), DataType::Union(r_fields, r_mode)) => {
+            // Defer field datatype check to `union_equal`

Review comment:
       Revised to compare lhs and rhs fields here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya edited a comment on pull request #1469: Implement ArrayEqual for UnionArray

Posted by GitBox <gi...@apache.org>.
viirya edited a comment on pull request #1469:
URL: https://github.com/apache/arrow-rs/pull/1469#issuecomment-1084958489


   Thanks @alamb @tustvold 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org