You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by rx...@apache.org on 2017/06/10 01:29:36 UTC
spark git commit: [SPARK-21042][SQL] Document Dataset.union is
resolution by position
Repository: spark
Updated Branches:
refs/heads/master 571635488 -> b78e3849b
[SPARK-21042][SQL] Document Dataset.union is resolution by position
## What changes were proposed in this pull request?
Document Dataset.union is resolution by position, not by name, since this has been a confusing point for a lot of users.
## How was this patch tested?
N/A - doc only change.
Author: Reynold Xin <rx...@databricks.com>
Closes #18256 from rxin/SPARK-21042.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b78e3849
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b78e3849
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b78e3849
Branch: refs/heads/master
Commit: b78e3849b20d0d09b7146efd7ce8f203ef67b890
Parents: 5716354
Author: Reynold Xin <rx...@databricks.com>
Authored: Fri Jun 9 18:29:33 2017 -0700
Committer: Reynold Xin <rx...@databricks.com>
Committed: Fri Jun 9 18:29:33 2017 -0700
----------------------------------------------------------------------
R/pkg/R/DataFrame.R | 1 +
python/pyspark/sql/dataframe.py | 13 +++++++++----
.../src/main/scala/org/apache/spark/sql/Dataset.scala | 14 ++++++++------
3 files changed, 18 insertions(+), 10 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/R/pkg/R/DataFrame.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 166b398..3b9d42d 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -2646,6 +2646,7 @@ generateAliasesForIntersectedCols <- function (x, intersectedColNames, suffix) {
#' Input SparkDataFrames can have different schemas (names and data types).
#'
#' Note: This does not remove duplicate rows across the two SparkDataFrames.
+#' Also as standard in SQL, this function resolves columns by position (not by name).
#'
#' @param x A SparkDataFrame
#' @param y A SparkDataFrame
http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 99abfcc..8541403 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -1175,18 +1175,23 @@ class DataFrame(object):
@since(2.0)
def union(self, other):
- """ Return a new :class:`DataFrame` containing union of rows in this
- frame and another frame.
+ """ Return a new :class:`DataFrame` containing union of rows in this and another frame.
This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
(that does deduplication of elements), use this function followed by a distinct.
+
+ Also as standard in SQL, this function resolves columns by position (not by name).
"""
return DataFrame(self._jdf.union(other._jdf), self.sql_ctx)
@since(1.3)
def unionAll(self, other):
- """ Return a new :class:`DataFrame` containing union of rows in this
- frame and another frame.
+ """ Return a new :class:`DataFrame` containing union of rows in this and another frame.
+
+ This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
+ (that does deduplication of elements), use this function followed by a distinct.
+
+ Also as standard in SQL, this function resolves columns by position (not by name).
.. note:: Deprecated in 2.0, use union instead.
"""
http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index f7637e0..d28ff78 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1734,10 +1734,11 @@ class Dataset[T] private[sql](
/**
* Returns a new Dataset containing union of rows in this Dataset and another Dataset.
- * This is equivalent to `UNION ALL` in SQL.
*
- * To do a SQL-style set union (that does deduplication of elements), use this function followed
- * by a [[distinct]].
+ * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does
+ * deduplication of elements), use this function followed by a [[distinct]].
+ *
+ * Also as standard in SQL, this function resolves columns by position (not by name).
*
* @group typedrel
* @since 2.0.0
@@ -1747,10 +1748,11 @@ class Dataset[T] private[sql](
/**
* Returns a new Dataset containing union of rows in this Dataset and another Dataset.
- * This is equivalent to `UNION ALL` in SQL.
*
- * To do a SQL-style set union (that does deduplication of elements), use this function followed
- * by a [[distinct]].
+ * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does
+ * deduplication of elements), use this function followed by a [[distinct]].
+ *
+ * Also as standard in SQL, this function resolves columns by position (not by name).
*
* @group typedrel
* @since 2.0.0
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org