You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2015/06/29 23:57:04 UTC
[jira] [Commented] (SPARK-8712) Window function does not work with
distinct aggregations
[ https://issues.apache.org/jira/browse/SPARK-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606483#comment-14606483 ]
Yin Huai commented on SPARK-8712:
---------------------------------
Seems Hive's parser silently drops the window spec when DISTINCT is present.
> Window function does not work with distinct aggregations
> --------------------------------------------------------
>
> Key: SPARK-8712
> URL: https://issues.apache.org/jira/browse/SPARK-8712
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.4.0
> Reporter: Yin Huai
> Assignee: Yin Huai
>
> Seems we ignored distinct keyword. Also, for our master, I got
> {code}
> scala> Seq((1, 2, 3)).toDF("i", "j", "k").registerTempTable("t")
> scala> sql("select count(distinct j) over (partition by i) from t").explain(true)
> == Parsed Logical Plan ==
> 'Project [UnresolvedAlias
> CountDistinct
> UnresolvedAttribute [j]
> ]
> 'UnresolvedRelation [t], None
> == Analyzed Logical Plan ==
> _c0: bigint
> Aggregate [COUNT(DISTINCT j#23) AS _c0#27L]
> Subquery t
> Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24]
> LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]]
> == Optimized Logical Plan ==
> Aggregate [COUNT(DISTINCT j#23) AS _c0#27L]
> LocalRelation [j#23], [[2]]
> == Physical Plan ==
> GeneratedAggregate false, [CombineAndCount(partialSets#28) AS _c0#27L], false
> Exchange SinglePartition
> GeneratedAggregate true, [AddToHashSet(j#23) AS partialSets#28], false
> LocalTableScan [j#23], [[2]]
> Code Generation: true
> == RDD ==
> scala> sql("select count(j) over (partition by i) from t").explain(true)
> == Parsed Logical Plan ==
> 'Project [UnresolvedAlias
> WindowExpression
> UnresolvedWindowFunction count
> UnresolvedAttribute [j]
> WindowSpecDefinition UnspecifiedFrame
> UnresolvedAttribute [i]
> ]
> 'UnresolvedRelation [t], None
> == Analyzed Logical Plan ==
> _c0: bigint
> Project [_c0#31L]
> Project [j#23,i#22,_c0#31L,_c0#31L]
> Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> Project [j#23,i#22]
> Subquery t
> Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24]
> LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]]
> == Optimized Logical Plan ==
> Project [_c0#31L]
> Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> LocalRelation [j#23,i#22], [[2,1]]
> == Physical Plan ==
> Project [_c0#31L]
> Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> ExternalSort [i#22 ASC], false
> Exchange (HashPartitioning 200)
> LocalTableScan [j#23,i#22], [[2,1]]
> Code Generation: true
> == RDD ==
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org