You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by li...@apache.org on 2018/01/30 06:20:03 UTC
spark git commit: [SPARK-23157][SQL] Explain restriction on column
expression in withColumn()
Repository: spark
Updated Branches:
refs/heads/master b375397b1 -> 8b983243e
[SPARK-23157][SQL] Explain restriction on column expression in withColumn()
## What changes were proposed in this pull request?
It's not obvious from the comments that any added column must be a
function of the dataset that we are adding it to. Add a comment to
that effect to Scala, Python and R Data* methods.
Author: Henry Robinson <he...@cloudera.com>
Closes #20429 from henryr/SPARK-23157.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8b983243
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8b983243
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8b983243
Branch: refs/heads/master
Commit: 8b983243e45dfe2617c043a3229a7d87f4c4b44b
Parents: b375397
Author: Henry Robinson <he...@cloudera.com>
Authored: Mon Jan 29 22:19:59 2018 -0800
Committer: gatorsmile <ga...@gmail.com>
Committed: Mon Jan 29 22:19:59 2018 -0800
----------------------------------------------------------------------
R/pkg/R/DataFrame.R | 3 ++-
python/pyspark/sql/dataframe.py | 4 ++++
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala | 3 +++
3 files changed, 9 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/8b983243/R/pkg/R/DataFrame.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 29f3e98..547b5ea 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -2090,7 +2090,8 @@ setMethod("selectExpr",
#'
#' @param x a SparkDataFrame.
#' @param colName a column name.
-#' @param col a Column expression, or an atomic vector in the length of 1 as literal value.
+#' @param col a Column expression (which must refer only to this DataFrame), or an atomic vector in
+#' the length of 1 as literal value.
#' @return A SparkDataFrame with the new column added or the existing column replaced.
#' @family SparkDataFrame functions
#' @aliases withColumn,SparkDataFrame,character-method
http://git-wip-us.apache.org/repos/asf/spark/blob/8b983243/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index ac40308..055b2c4 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -1829,11 +1829,15 @@ class DataFrame(object):
Returns a new :class:`DataFrame` by adding a column or replacing the
existing column that has the same name.
+ The column expression must be an expression over this dataframe; attempting to add
+ a column from some other dataframe will raise an error.
+
:param colName: string, name of the new column.
:param col: a :class:`Column` expression for the new column.
>>> df.withColumn('age2', df.age + 2).collect()
[Row(age=2, name=u'Alice', age2=4), Row(age=5, name=u'Bob', age2=7)]
+
"""
assert isinstance(col, Column), "col should be Column"
return DataFrame(self._jdf.withColumn(colName, col._jc), self.sql_ctx)
http://git-wip-us.apache.org/repos/asf/spark/blob/8b983243/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index cc5b647..d47cd0a 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -2150,6 +2150,9 @@ class Dataset[T] private[sql](
* Returns a new Dataset by adding a column or replacing the existing column that has
* the same name.
*
+ * `column`'s expression must only refer to attributes supplied by this Dataset. It is an
+ * error to add a column that refers to some other Dataset.
+ *
* @group untypedrel
* @since 2.0.0
*/
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org