You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eric Liang (JIRA)" <ji...@apache.org> on 2016/11/10 00:10:58 UTC
[jira] [Created] (SPARK-18393) DataFrame pivot output column names
should respect aliases
Eric Liang created SPARK-18393:
----------------------------------
Summary: DataFrame pivot output column names should respect aliases
Key: SPARK-18393
URL: https://issues.apache.org/jira/browse/SPARK-18393
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Eric Liang
Priority: Minor
For example
{code}
val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id as b")
df
.groupBy('x)
.pivot("a", Seq(0, 1))
.agg(expr("sum(b)").as("blah"), expr("count(b)").as("foo"))
.show()
+---+--------------------+---------------------+--------------------+---------------------+
| x|0_sum(`b`) AS `blah`|0_count(`b`) AS `foo`|1_sum(`b`) AS `blah`|1_count(`b`) AS `foo`|
+---+--------------------+---------------------+--------------------+---------------------+
| 0| 450| 10| 500| 10|
| 1| 510| 10| 460| 10|
| 3| 530| 10| 480| 10|
| 2| 470| 10| 520| 10|
| 4| 490| 10| 540| 10|
+---+--------------------+---------------------+--------------------+---------------------+
{code}
The column names here are quite hard to read. Ideally we would respect the aliases and generate column names like 0_blah, 0_foo, 1_blah, 1_foo instead.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org