You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/14 10:32:43 UTC

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27565: [WIP][SPARK-30791] Dataframe add sameSemantics and sementicHash method

HyukjinKwon commented on a change in pull request #27565: [WIP][SPARK-30791] Dataframe add sameSemantics and sementicHash method
URL: https://github.com/apache/spark/pull/27565#discussion_r379358950
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
 ##########
 @@ -3308,6 +3308,37 @@ class Dataset[T] private[sql](
     files.toSet.toArray
   }
 
+  /**
+   * Returns true when the query plan of the given Dataset will return the same results as this
+   * Dataset.
+   *
+   * Since its likely undecidable to generally determine if two given plans will produce the same
+   * results, it is okay for this function to return false, even if the results are actually
+   * the same.  Such behavior will not affect correctness, only the application of performance
+   * enhancements like caching.  However, it is not acceptable to return true if the results could
+   * possibly be different.
+   *
+   * This function performs a modified version of equality that is tolerant of cosmetic
+   * differences like attribute naming and or expression id differences.
+   *
+   * @since 3.1.0
+   */
+  @DeveloperApi
+  def sameSemantics(other: Dataset[T]): Boolean = {
+    queryExecution.analyzed.sameResult(other.queryExecution.analyzed)
+  }
+
+  /**
+   * Returns a `hashCode` for the calculation performed by the query plan of this Dataset. Unlike
+   * the standard `hashCode`, an attempt has been made to eliminate cosmetic differences.
 
 Review comment:
   I would write as below:
   
   ```
   Returns a `hashCode` of the logical query plan against this [[Dataset]].
   
   @note Unlike the standard `hashCode`, the hash is calculated against the query plan
   simplified by tolerating the cosmetic differences such as attribute names.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org