You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/10/29 09:50:09 UTC

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26286: [SPARK-26739][SQL] Standardized Join Types for DataFrames

HyukjinKwon commented on a change in pull request #26286: [SPARK-26739][SQL] Standardized Join Types for DataFrames
URL: https://github.com/apache/spark/pull/26286#discussion_r339977592
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
 ##########
 @@ -980,21 +980,48 @@ class Dataset[T] private[sql](
    * `DataFrame`s, you will NOT be able to reference any columns after the join, since
    * there is no way to disambiguate which side of the join you would like to reference.
    *
+   * @deprecated Use
+   * [[Dataset.join(Dataset[_], Seq[String], JoinType): DataFrame* this version]] instead
+   *
    * @group untypedrel
    * @since 2.0.0
    */
+  @deprecated("Use [[Dataset#join(Dataset[_], Seq[String], JoinType): DataFrame* this]]", "3.0.0")
   def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): DataFrame = {
+    join(right, usingColumns, JoinType(joinType))
+  }
+
+  /**
+   * Equi-join with another `DataFrame` using the given columns. A cross join with a predicate
+   * is specified as an inner join. If you would explicitly like to perform a cross join use the
+   * `crossJoin` method.
+   *
+   * Different from other join functions, the join columns will only appear once in the output,
+   * i.e. similar to SQL's `JOIN USING` syntax.
+   *
+   * @param right Right side of the join operation.
+   * @param usingColumns Names of the columns to join on. This columns must exist on both sides.
+   * @param joinType Type of join to perform (instance of [[JoinType!]]. Default [[Inner]].
+   *
+   * @note If you perform a self-join using this function without aliasing the input
+   * `DataFrame`s, you will NOT be able to reference any columns after the join, since
+   * there is no way to disambiguate which side of the join you would like to reference.
+   *
+   * @group untypedrel
+   * @since 2.0.0
+   */
+  def join(right: Dataset[_], usingColumns: Seq[String], joinType: JoinType): DataFrame = {
 
 Review comment:
   `JoinType` isn't supposed to be an API. It's internal purpose and under catalyst. Why don't we just add enums?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org