You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2017/10/28 12:47:29 UTC

spark git commit: [SPARK-22335][SQL] Clarify union behavior on Dataset of typed objects in the document

Repository: spark
Updated Branches:
  refs/heads/master d28d5732a -> 683ffe062


[SPARK-22335][SQL] Clarify union behavior on Dataset of typed objects in the document

## What changes were proposed in this pull request?

Seems that end users can be confused by the union's behavior on Dataset of typed objects. We can clarity it more in the document of `union` function.

## How was this patch tested?

Only document change.

Author: Liang-Chi Hsieh <vi...@gmail.com>

Closes #19570 from viirya/SPARK-22335.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/683ffe06
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/683ffe06
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/683ffe06

Branch: refs/heads/master
Commit: 683ffe0620e69fd6e9f92c1037eef7996029aba8
Parents: d28d573
Author: Liang-Chi Hsieh <vi...@gmail.com>
Authored: Sat Oct 28 21:47:15 2017 +0900
Committer: hyukjinkwon <gu...@gmail.com>
Committed: Sat Oct 28 21:47:15 2017 +0900

----------------------------------------------------------------------
 .../scala/org/apache/spark/sql/Dataset.scala    | 21 +++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/683ffe06/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index fe4e192..bd99ec5 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1747,7 +1747,26 @@ class Dataset[T] private[sql](
    * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does
    * deduplication of elements), use this function followed by a [[distinct]].
    *
-   * Also as standard in SQL, this function resolves columns by position (not by name).
+   * Also as standard in SQL, this function resolves columns by position (not by name):
+   *
+   * {{{
+   *   val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2")
+   *   val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0")
+   *   df1.union(df2).show
+   *
+   *   // output:
+   *   // +----+----+----+
+   *   // |col0|col1|col2|
+   *   // +----+----+----+
+   *   // |   1|   2|   3|
+   *   // |   4|   5|   6|
+   *   // +----+----+----+
+   * }}}
+   *
+   * Notice that the column positions in the schema aren't necessarily matched with the
+   * fields in the strongly typed objects in a Dataset. This function resolves columns
+   * by their positions in the schema, not the fields in the strongly typed objects. Use
+   * [[unionByName]] to resolve columns by field name in the typed objects.
    *
    * @group typedrel
    * @since 2.0.0


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org