You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/02/28 21:00:06 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5421: Implement/fix Eq and Hash for Expr and LogicalPlan

alamb commented on code in PR #5421:
URL: https://github.com/apache/arrow-datafusion/pull/5421#discussion_r1120754049


##########
datafusion/common/src/dfschema.rs:
##########
@@ -496,6 +497,15 @@ impl From<DFSchema> for SchemaRef {
     }
 }
 
+// Hashing refers to a subset of fields considered in PartialEq.
+#[allow(clippy::derive_hash_xor_eq)]
+impl Hash for DFSchema {
+    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+        self.fields.hash(state);
+        self.metadata.len().hash(state); // HashMap is not hashable

Review Comment:
   I agree it is ok to just use the metadata's length to hash as it satisfies the EQ constraint
   
   https://doc.rust-lang.org/std/hash/trait.Hash.html#hash-and-eq



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1515,8 +1515,30 @@ pub struct TableScan {
     pub fetch: Option<usize>,
 }
 
+impl PartialEq for TableScan {
+    fn eq(&self, other: &Self) -> bool {
+        self.table_name == other.table_name
+            && self.projection == other.projection
+            && self.projected_schema == other.projected_schema
+            && self.filters == other.filters
+            && self.fetch == other.fetch
+    }
+}
+
+impl Eq for TableScan {}
+
+impl Hash for TableScan {
+    fn hash<H: Hasher>(&self, state: &mut H) {
+        self.table_name.hash(state);
+        self.projection.hash(state);

Review Comment:
   Hash is ok that it doesn't also include `source` I think



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1685,8 +1725,33 @@ pub struct Extension {
     pub node: Arc<dyn UserDefinedLogicalNode>,
 }
 
+struct ExtensionExplainDisplay<'a> {
+    extension: &'a Extension,
+}
+
+impl Display for ExtensionExplainDisplay<'_> {
+    fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
+        self.extension.node.fmt_for_explain(f)
+    }
+}
+
+impl PartialEq for Extension {
+    fn eq(&self, other: &Self) -> bool {
+        format!("{}", ExtensionExplainDisplay { extension: self })
+            == format!("{}", ExtensionExplainDisplay { extension: other })
+    }
+}
+
+impl Eq for Extension {}
+
+impl Hash for Extension {

Review Comment:
   I am not sure about using the textual output for equality comparison as someone who has implemented Extension may have a constant output, for example. I think a safer (though backwards incompatible change) would be to make `UserDefinedLogicalNode` also be `Hash` and `PartialEq`
   
   Like
   
   ```rust
   trait UserDefinedLogicalNode: Hash + PartialEq {
   ...
   }
   ```



##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -326,13 +327,12 @@ impl Optimizer {
             // TODO this is an expensive way to see if the optimizer did anything and
             // it would be better to change the OptimizerRule trait to return an Option
             // instead
-            let new_plan_str = format!("{}", new_plan.display_indent());
-            if plan_str == new_plan_str {
+            if old_plan.as_ref() == &new_plan {

Review Comment:
   👍  nice
   
   



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1515,8 +1515,30 @@ pub struct TableScan {
     pub fetch: Option<usize>,
 }
 
+impl PartialEq for TableScan {
+    fn eq(&self, other: &Self) -> bool {
+        self.table_name == other.table_name

Review Comment:
   I think this also needs to check that the `source` is equal as well -- I think we can do so via https://doc.rust-lang.org/std/sync/struct.Arc.html#method.ptr_eq



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1846,7 +1911,7 @@ impl Join {
 }
 
 /// Subquery
-#[derive(Clone)]
+#[derive(Clone, PartialEq, Eq, Hash)]

Review Comment:
   🎉 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org