You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "mslapek (via GitHub)" <gi...@apache.org> on 2023/02/27 16:57:32 UTC

[GitHub] [arrow-datafusion] mslapek opened a new pull request, #5421: Implement/fix Eq and Hash for Expr and LogicalPlan

mslapek opened a new pull request, #5421:
URL: https://github.com/apache/arrow-datafusion/pull/5421

# Which issue does this PR close?

Closes #5400.

# Rationale for this change

Makes [datafusion::logical_expr::Subquery](https://docs.rs/datafusion-expr/18.0.0/datafusion_expr/struct.Subquery.html) to respect the requirements of [std::cmp::Eq](https://doc.rust-lang.org/std/cmp/trait.Eq.html).

Because `Expr` contains `Subquery`, it also fixes `Expr::eq(..)`.

# What changes are included in this PR?

Fixed `Eq` for `Expr`.

Added `PartialEq`, `Eq` and `Hash` traits to `LogicalPlan` (because `Expr` contains `LogicalPlan` through `Subquery`).

Replaced the comparison of stringified plans with `eq(...)` comparison in `optimizer.rs`.

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
-->

No new tests - most of the PRs contents is generated by `derive(..)` macros.

Optimizer loop already has existing tests.

# Are there any user-facing changes?

Added `PartialEq`, `Eq` and `Hash` traits to `LogicalPlan` and a few other structures.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mslapek commented on a diff in pull request #5421: Implement/fix Eq and Hash for Expr and LogicalPlan

Posted by "mslapek (via GitHub)" <gi...@apache.org>.

mslapek commented on code in PR #5421:
URL: https://github.com/apache/arrow-datafusion/pull/5421#discussion_r1122188389


##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1685,8 +1725,33 @@ pub struct Extension {
     pub node: Arc<dyn UserDefinedLogicalNode>,
 }
 
+struct ExtensionExplainDisplay<'a> {
+    extension: &'a Extension,
+}
+
+impl Display for ExtensionExplainDisplay<'_> {
+    fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
+        self.extension.node.fmt_for_explain(f)
+    }
+}
+
+impl PartialEq for Extension {
+    fn eq(&self, other: &Self) -> bool {
+        format!("{}", ExtensionExplainDisplay { extension: self })
+            == format!("{}", ExtensionExplainDisplay { extension: other })
+    }
+}
+
+impl Eq for Extension {}
+
+impl Hash for Extension {

Review Comment:
   Couldn't add `Hash + PartialEq`, because `UserDefinedLogicalNode` must be *object-safe*.
   
   Instead added `dyn_eq` and `dyn_hash` methods serving the same purpose.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mslapek commented on pull request #5421: Implement/fix Eq and Hash for Expr and LogicalPlan

Posted by "mslapek (via GitHub)" <gi...@apache.org>.

mslapek commented on PR #5421:
URL: https://github.com/apache/arrow-datafusion/pull/5421#issuecomment-1453471858

   @alamb Thanks for the review! 🎉


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5421: Implement/fix Eq and Hash for Expr and LogicalPlan

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on code in PR #5421:
URL: https://github.com/apache/arrow-datafusion/pull/5421#discussion_r1120754049


##########
datafusion/common/src/dfschema.rs:
##########
@@ -496,6 +497,15 @@ impl From<DFSchema> for SchemaRef {
     }
 }
 
+// Hashing refers to a subset of fields considered in PartialEq.
+#[allow(clippy::derive_hash_xor_eq)]
+impl Hash for DFSchema {
+    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+        self.fields.hash(state);
+        self.metadata.len().hash(state); // HashMap is not hashable

Review Comment:
   I agree it is ok to just use the metadata's length to hash as it satisfies the EQ constraint
   
   https://doc.rust-lang.org/std/hash/trait.Hash.html#hash-and-eq



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1515,8 +1515,30 @@ pub struct TableScan {
     pub fetch: Option<usize>,
 }
 
+impl PartialEq for TableScan {
+    fn eq(&self, other: &Self) -> bool {
+        self.table_name == other.table_name
+            && self.projection == other.projection
+            && self.projected_schema == other.projected_schema
+            && self.filters == other.filters
+            && self.fetch == other.fetch
+    }
+}
+
+impl Eq for TableScan {}
+
+impl Hash for TableScan {
+    fn hash<H: Hasher>(&self, state: &mut H) {
+        self.table_name.hash(state);
+        self.projection.hash(state);

Review Comment:
   Hash is ok that it doesn't also include `source` I think



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1685,8 +1725,33 @@ pub struct Extension {
     pub node: Arc<dyn UserDefinedLogicalNode>,
 }
 
+struct ExtensionExplainDisplay<'a> {
+    extension: &'a Extension,
+}
+
+impl Display for ExtensionExplainDisplay<'_> {
+    fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
+        self.extension.node.fmt_for_explain(f)
+    }
+}
+
+impl PartialEq for Extension {
+    fn eq(&self, other: &Self) -> bool {
+        format!("{}", ExtensionExplainDisplay { extension: self })
+            == format!("{}", ExtensionExplainDisplay { extension: other })
+    }
+}
+
+impl Eq for Extension {}
+
+impl Hash for Extension {

Review Comment:
   I am not sure about using the textual output for equality comparison as someone who has implemented Extension may have a constant output, for example. I think a safer (though backwards incompatible change) would be to make `UserDefinedLogicalNode` also be `Hash` and `PartialEq`
   
   Like
   
   ```rust
   trait UserDefinedLogicalNode: Hash + PartialEq {
   ...
   }
   ```



##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -326,13 +327,12 @@ impl Optimizer {
             // TODO this is an expensive way to see if the optimizer did anything and
             // it would be better to change the OptimizerRule trait to return an Option
             // instead
-            let new_plan_str = format!("{}", new_plan.display_indent());
-            if plan_str == new_plan_str {
+            if old_plan.as_ref() == &new_plan {

Review Comment:
   👍  nice
   
   



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1515,8 +1515,30 @@ pub struct TableScan {
     pub fetch: Option<usize>,
 }
 
+impl PartialEq for TableScan {
+    fn eq(&self, other: &Self) -> bool {
+        self.table_name == other.table_name

Review Comment:
   I think this also needs to check that the `source` is equal as well -- I think we can do so via https://doc.rust-lang.org/std/sync/struct.Arc.html#method.ptr_eq



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1846,7 +1911,7 @@ impl Join {
 }
 
 /// Subquery
-#[derive(Clone)]
+#[derive(Clone, PartialEq, Eq, Hash)]

Review Comment:
   🎉 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #5421: Implement/fix Eq and Hash for Expr and LogicalPlan

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb merged PR #5421:
URL: https://github.com/apache/arrow-datafusion/pull/5421


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mslapek commented on a diff in pull request #5421: Implement/fix Eq and Hash for Expr and LogicalPlan

Posted by "mslapek (via GitHub)" <gi...@apache.org>.

mslapek commented on code in PR #5421:
URL: https://github.com/apache/arrow-datafusion/pull/5421#discussion_r1122161900


##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1515,8 +1515,30 @@ pub struct TableScan {
     pub fetch: Option<usize>,
 }
 
+impl PartialEq for TableScan {
+    fn eq(&self, other: &Self) -> bool {
+        self.table_name == other.table_name

Review Comment:
   This `Arc::ptr_eq` might be **risky**... [Arc::ptr_eq](https://doc.rust-lang.org/std/sync/struct.Arc.html#method.ptr_eq) and [std::ptr::eq](https://doc.rust-lang.org/std/ptr/fn.eq.html) say that `dyn` trait comparisons are **unreliable**. 😕
   
   Even clippy gives an error [vtable_address_comparisons](https://rust-lang.github.io/rust-clippy/master/index.html#vtable_address_comparisons) from **correctness** 🔞 category.
   
   I suggest to reconsider the request about `source` comparison.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5421: Implement/fix Eq and Hash for Expr and LogicalPlan

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on code in PR #5421:
URL: https://github.com/apache/arrow-datafusion/pull/5421#discussion_r1123719124


##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1515,8 +1515,30 @@ pub struct TableScan {
     pub fetch: Option<usize>,
 }
 
+impl PartialEq for TableScan {
+    fn eq(&self, other: &Self) -> bool {
+        self.table_name == other.table_name

Review Comment:
   Upon consideration within a single plan, the `table_name` should be unique. I was originally worried about the case where two TableScan's that had different `source`s but all other fields are the same resulting in a false positive. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] ursabot commented on pull request #5421: Implement/fix Eq and Hash for Expr and LogicalPlan

Posted by "ursabot (via GitHub)" <gi...@apache.org>.

ursabot commented on PR #5421:
URL: https://github.com/apache/arrow-datafusion/pull/5421#issuecomment-1453467023

   Benchmark runs are scheduled for baseline = be6efbc93f04d4459e8f61345c830afc73d08fd7 and contender = 61fc51446cb06bc6c8de69d50c9e5f79dede08fb. 61fc51446cb06bc6c8de69d50c9e5f79dede08fb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/4d2822ccb6054550a6b65ebec0e2ac2c...c6c58fc8d857432eb57daa23924e8220/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/de32cc10c6f1471ca92a826bd0b9e793...1c7717b001544c08a515f2455601ddc9/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/1162f3b65a204bfdb0310651d9407834...1762566ec8854078bfcbe742c4f116df/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/08dac026f95e4e4fb468f876018376a2...978138e916234650ab2f069662e60eff/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org