You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takeshi Yamamuro (JIRA)" <ji...@apache.org> on 2016/08/25 16:37:21 UTC
[jira] [Commented] (SPARK-17237) DataFrame fill after pivot causing
org.apache.spark.sql.AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437184#comment-15437184 ]
Takeshi Yamamuro commented on SPARK-17237:
------------------------------------------
The root cause of this bug is nested backticks in column names.
UnresolvedAttribute#parseAttributeName cannot handle the nested backtics.
> DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException
> -------------------------------------------------------------------------
>
> Key: SPARK-17237
> URL: https://issues.apache.org/jira/browse/SPARK-17237
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Jiang Qiqi
> Labels: newbie
>
> I am trying to run a pivot transformation which I ran on a spark1.6 cluster,
> namely
> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c")
> res1: org.apache.spark.sql.DataFrame = [a: int, b: int, c: int]
> scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0)
> res2: org.apache.spark.sql.DataFrame = [a: int, 3_count(c): bigint, 3_avg(c): double, 4_count(c): bigint, 4_avg(c): double]
> scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0).show
> +---+----------+--------+----------+--------+
> | a|3_count(c)|3_avg(c)|4_count(c)|4_avg(c)|
> +---+----------+--------+----------+--------+
> | 2| 1| 4.0| 0| 0.0|
> | 3| 0| 0.0| 1| 5.0|
> +---+----------+--------+----------+--------+
> after upgrade the environment to spark2.0, got an error while executing .na.fill method
> scala> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c")
> res3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field]
> scala> res3.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0)
> org.apache.spark.sql.AnalysisException: syntax error in attribute name: `3_count(`c`)`;
> at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:103)
> at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:113)
> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168)
> at org.apache.spark.sql.Dataset.resolve(Dataset.scala:218)
> at org.apache.spark.sql.Dataset.col(Dataset.scala:921)
> at org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:411)
> at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:162)
> at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:159)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
> at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:159)
> at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:149)
> at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org