You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2015/06/29 21:54:04 UTC
[jira] [Updated] (SPARK-8712) Window function does not work with
distinct aggregations
[ https://issues.apache.org/jira/browse/SPARK-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai updated SPARK-8712:
----------------------------
Description:
Seems we ignored distinct keyword. Also, for our master, I got
{code}
scala> Seq((1, 2, 3)).toDF("i", "j", "k").registerTempTable("t")
scala> sql("select count(distinct j) over (partition by i) from t").explain(true)
== Parsed Logical Plan ==
'Project [UnresolvedAlias
CountDistinct
UnresolvedAttribute [j]
]
'UnresolvedRelation [t], None
== Analyzed Logical Plan ==
_c0: bigint
Aggregate [COUNT(DISTINCT j#23) AS _c0#27L]
Subquery t
Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24]
LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]]
== Optimized Logical Plan ==
Aggregate [COUNT(DISTINCT j#23) AS _c0#27L]
LocalRelation [j#23], [[2]]
== Physical Plan ==
GeneratedAggregate false, [CombineAndCount(partialSets#28) AS _c0#27L], false
Exchange SinglePartition
GeneratedAggregate true, [AddToHashSet(j#23) AS partialSets#28], false
LocalTableScan [j#23], [[2]]
Code Generation: true
== RDD ==
scala> sql("select count(j) over (partition by i) from t").explain(true)
== Parsed Logical Plan ==
'Project [UnresolvedAlias
WindowExpression
UnresolvedWindowFunction count
UnresolvedAttribute [j]
WindowSpecDefinition UnspecifiedFrame
UnresolvedAttribute [i]
]
'UnresolvedRelation [t], None
== Analyzed Logical Plan ==
_c0: bigint
Project [_c0#31L]
Project [j#23,i#22,_c0#31L,_c0#31L]
Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
Project [j#23,i#22]
Subquery t
Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24]
LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]]
== Optimized Logical Plan ==
Project [_c0#31L]
Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
LocalRelation [j#23,i#22], [[2,1]]
== Physical Plan ==
Project [_c0#31L]
Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
ExternalSort [i#22 ASC], false
Exchange (HashPartitioning 200)
LocalTableScan [j#23,i#22], [[2,1]]
Code Generation: true
== RDD ==
{code}
was:Seems we ignored distinct keyword.
> Window function does not work with distinct aggregations
> --------------------------------------------------------
>
> Key: SPARK-8712
> URL: https://issues.apache.org/jira/browse/SPARK-8712
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.4.0
> Reporter: Yin Huai
> Assignee: Yin Huai
>
> Seems we ignored distinct keyword. Also, for our master, I got
> {code}
> scala> Seq((1, 2, 3)).toDF("i", "j", "k").registerTempTable("t")
> scala> sql("select count(distinct j) over (partition by i) from t").explain(true)
> == Parsed Logical Plan ==
> 'Project [UnresolvedAlias
> CountDistinct
> UnresolvedAttribute [j]
> ]
> 'UnresolvedRelation [t], None
> == Analyzed Logical Plan ==
> _c0: bigint
> Aggregate [COUNT(DISTINCT j#23) AS _c0#27L]
> Subquery t
> Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24]
> LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]]
> == Optimized Logical Plan ==
> Aggregate [COUNT(DISTINCT j#23) AS _c0#27L]
> LocalRelation [j#23], [[2]]
> == Physical Plan ==
> GeneratedAggregate false, [CombineAndCount(partialSets#28) AS _c0#27L], false
> Exchange SinglePartition
> GeneratedAggregate true, [AddToHashSet(j#23) AS partialSets#28], false
> LocalTableScan [j#23], [[2]]
> Code Generation: true
> == RDD ==
> scala> sql("select count(j) over (partition by i) from t").explain(true)
> == Parsed Logical Plan ==
> 'Project [UnresolvedAlias
> WindowExpression
> UnresolvedWindowFunction count
> UnresolvedAttribute [j]
> WindowSpecDefinition UnspecifiedFrame
> UnresolvedAttribute [i]
> ]
> 'UnresolvedRelation [t], None
> == Analyzed Logical Plan ==
> _c0: bigint
> Project [_c0#31L]
> Project [j#23,i#22,_c0#31L,_c0#31L]
> Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> Project [j#23,i#22]
> Subquery t
> Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24]
> LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]]
> == Optimized Logical Plan ==
> Project [_c0#31L]
> Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> LocalRelation [j#23,i#22], [[2,1]]
> == Physical Plan ==
> Project [_c0#31L]
> Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> ExternalSort [i#22 ASC], false
> Exchange (HashPartitioning 200)
> LocalTableScan [j#23,i#22], [[2,1]]
> Code Generation: true
> == RDD ==
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org