You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2016/11/02 09:46:58 UTC

[jira] [Resolved] (SPARK-13913) DataFrame.withColumn fails when trying to replace existing column with dot in name

     [ https://issues.apache.org/jira/browse/SPARK-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-13913.
----------------------------------
    Resolution: Cannot Reproduce

I am resolving this as Cannot Reproduce as I can't against master.

{code}
scala> val df = spark.range(1).selectExpr("struct(1) as a")
df: org.apache.spark.sql.DataFrame = [a: struct<col1: int>]

scala> df.withColumn("a.col1", df.col("a.col1")).show()
+---+------+
|  a|a.col1|
+---+------+
|[1]|     1|
+---+------+
{code}

Please revoke my action if I did this wrongly or anyone still faces this issue.

> DataFrame.withColumn fails when trying to replace existing column with dot in name
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-13913
>                 URL: https://issues.apache.org/jira/browse/SPARK-13913
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Emmanuel Leroy
>
> http://stackoverflow.com/questions/36000147/spark-1-6-apply-function-to-column-with-dot-in-name-how-to-properly-escape-coln/36005334#36005334
> if I do (column name exists already and has dot in it, but is not a nested column):
> scala> df = df.withColumn("raw.hourOfDay", df.col("`raw.hourOfDay`"))
> scala> df = df.withColumn("raw.hourOfDay", df.col("`raw.hourOfDay`"))
> org.apache.spark.sql.AnalysisException: cannot resolve 'raw.minOfDay' given input columns raw.hourOfDay_2, raw.dayOfWeek, raw.sensor2, raw.hourOfDay, raw.minOfDay;
>         at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>         at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60)
>         at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57)
>         at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:319)
>         at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:319)
>         at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:53)
>         at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:318)
>         at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:107)
>         at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:117)
>         at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:121)
>         at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>         at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>         at scala.collection.immutable.List.foreach(List.scala:318)
>         at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>         at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:121)
>         at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:125)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>         at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
> but if I do:
> scala> df = df.withColumn("raw.hourOfDay_2", df.col("`raw.hourOfDay`"))
> scala> df.printSchema
> root
>  |-- raw.hourOfDay: long (nullable = true)
>  |-- raw.minOfDay: long (nullable = true)
>  |-- raw.dayOfWeek: long (nullable = true)
>  |-- raw.sensor2: long (nullable = true)
>  |-- raw.hourOfDay_2: long (nullable = true)
> it works fine (i.e. new column is created with dot in ColName).
> The only difference is that the name "raw.hourOfDay_2" does not exist yet, and is properly created as a colName with dot, not as a nested column.
> The documentation however says that if the column exists it will replace it, but it seems there is a miss-interpretation of the column name as a nested column
> def withColumn(colName: String, col: Column): DataFrame
> Returns a new DataFrame by adding a column or replacing the existing column that has the same name.
> Replacing a column without a dot in it works fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org