You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Herman van Hovell (JIRA)" <ji...@apache.org> on 2016/11/16 14:20:58 UTC

[jira] [Commented] (SPARK-8816) Improve Performance of Unbounded Following Window Frame Processing

    [ https://issues.apache.org/jira/browse/SPARK-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670524#comment-15670524 ] 

Herman van Hovell commented on SPARK-8816:
------------------------------------------

The following paper provides a nice solution if we ever want to improve this: http://www.vldb.org/pvldb/vol8/p1058-leis.pdf

It would require random access though (which we do not have for windows larger than 4000 rows).

> Improve Performance of Unbounded Following Window Frame Processing
> ------------------------------------------------------------------
>
>                 Key: SPARK-8816
>                 URL: https://issues.apache.org/jira/browse/SPARK-8816
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Herman van Hovell
>            Priority: Minor
>
> The performance of Unbounded Following frames in both the current and the proposed (SPARK-8638) implementation of Window Functions is quite bad: O(N*(N-1)/2).
> A solution to this is to process such frames in reverse. This would effectively reduce the complexity to O(N). The problem with this approach that it assumes  that AggregateExpression are communitative. Most are, but some are actually order based: FIRST/LAST. There are two solution for this:
> * Only allow communitative aggregates to processed in reverse order. In practice this would mean, that a white list containing all allowed aggregates  is used, e.g. Sum, Average, Min, Max, ...
> * Add functionality to WindowFunction or even better AggregateExpression, which would allow us to get the reverse operator from the expression, and use this reverse for processing, for example:
> {noformat}
> case class Max(child: Expression) extends AggregateExpression with Ordered {
>   ...
>   
>   def reverse: Min = Min(child)
> }
> {noformat}
> The impact and extensibility of the first option are lower than of the second. 
> We might also want to asses how often such frames are used. It might not be a problem at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org