You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/07/28 06:04:05 UTC

[jira] [Commented] (SPARK-8712) Hive's Parser does not support distinct aggregations with OVER clause

    [ https://issues.apache.org/jira/browse/SPARK-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643818#comment-14643818 ] 

Apache Spark commented on SPARK-8712:
-------------------------------------

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/7715

> Hive's Parser does not support distinct aggregations with OVER clause
> ---------------------------------------------------------------------
>
>                 Key: SPARK-8712
>                 URL: https://issues.apache.org/jira/browse/SPARK-8712
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Yin Huai
>
> Hive's parser ignores Window spec when a distinct aggregation is used.
> {code}
> scala> Seq((1, 2, 3)).toDF("i", "j", "k").registerTempTable("t")
> scala> sql("select count(distinct j) over (partition by i) from t").explain(true)
> == Parsed Logical Plan ==
> 'Project [UnresolvedAlias
>  CountDistinct
>   UnresolvedAttribute [j]
> ]
>  'UnresolvedRelation [t], None
> == Analyzed Logical Plan ==
> _c0: bigint
> Aggregate [COUNT(DISTINCT j#23) AS _c0#27L]
>  Subquery t
>   Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24]
>    LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]]
> == Optimized Logical Plan ==
> Aggregate [COUNT(DISTINCT j#23) AS _c0#27L]
>  LocalRelation [j#23], [[2]]
> == Physical Plan ==
> GeneratedAggregate false, [CombineAndCount(partialSets#28) AS _c0#27L], false
>  Exchange SinglePartition
>   GeneratedAggregate true, [AddToHashSet(j#23) AS partialSets#28], false
>    LocalTableScan [j#23], [[2]]
> Code Generation: true
> == RDD ==
> scala> sql("select count(j) over (partition by i) from t").explain(true)
> == Parsed Logical Plan ==
> 'Project [UnresolvedAlias
>  WindowExpression
>   UnresolvedWindowFunction count
>    UnresolvedAttribute [j]
>   WindowSpecDefinition UnspecifiedFrame
>    UnresolvedAttribute [i]
> ]
>  'UnresolvedRelation [t], None
> == Analyzed Logical Plan ==
> _c0: bigint
> Project [_c0#31L]
>  Project [j#23,i#22,_c0#31L,_c0#31L]
>   Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>    Project [j#23,i#22]
>     Subquery t
>      Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24]
>       LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]]
> == Optimized Logical Plan ==
> Project [_c0#31L]
>  Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>   LocalRelation [j#23,i#22], [[2,1]]
> == Physical Plan ==
> Project [_c0#31L]
>  Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>   ExternalSort [i#22 ASC], false
>    Exchange (HashPartitioning 200)
>     LocalTableScan [j#23,i#22], [[2,1]]
> Code Generation: true
> == RDD ==
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org