You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xin Wu (JIRA)" <ji...@apache.org> on 2016/04/04 07:56:25 UTC

[jira] [Updated] (SPARK-14361) Support EXCLUDE clause in Window function framing

     [ https://issues.apache.org/jira/browse/SPARK-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xin Wu updated SPARK-14361:
---------------------------
    Description: The current Spark SQL does not support the {code}exclude{code} clause in Window function framing clause, which is part of ANSI SQL2003's   (was: The current Spark SQL does not support the `exclusion` clause, which is part of ANSI 

SQL2003’s `Window` syntax. For example, IBM Netezza fully supports it as shown in the 

[document web link]

(https://www.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.dbu.doc/c_dbuser_wi

ndow_aggregation_family_syntax.html). We propose to support it in this JIRA. 


##### Introduction

Below is the ANSI SQL2003’s `Window` syntax:
```
FUNCTION_NAME(expr) OVER {window_name | (window_specification)}
window_specification ::= [window_name] [partitioning] [ordering] [framing]
partitioning ::= PARTITION BY value[, value...] [COLLATE collation_name]
ordering ::= ORDER [SIBLINGS] BY rule[, rule...]
rule ::= {value | position | alias} [ASC | DESC] [NULLS {FIRST | LAST}]
framing ::= {ROWS | RANGE} {start | between} [exclusion]
start ::= {UNBOUNDED PRECEDING | unsigned-integer PRECEDING | CURRENT ROW}
between ::= BETWEEN bound AND bound
bound ::= {start | UNBOUNDED FOLLOWING | unsigned-integer FOLLOWING}
exclusion ::= {EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES | EXCLUDE NO OTHERS}
```
Exclusion clause can be used to excluded certain rows from the window framing when 

calculating window aggregation function (e.g. AVG, SUM, MAX, MIN, COUNT, etc)  related 

to current row. Types of window functions that are not supported are listed below:

1. Offset functions, such as lead(), lag()
2. Ranking functions, such as rank(), dense_rank(), percent_rank(), cume_dist, ntile()
3. Row number function, such as row_number()

##### Definition
Syntax | Description
------------ | -------------
EXCLUDE CURRENT ROW | Specifies excluding the current row.
EXCLUDE GROUP | Specifies excluding the current row and all rows that are tied with it. 

Ties occur when there is a match on the order column or columns.
EXCLUDE NO OTHERS | Specifies not excluding any rows. This value is the default if you 

specify no exclusion.
EXCLUDE TIES | Specifies excluding all rows that are tied with the current row (peer 

rows), but retaining the current row.

##### Use-case Examples:

- Let's say you want to find out for every employee, where is his/her salary at compared 

to the average salary of those within the same department and whose ages are within 5 

years younger and older. The query could be:

```SQL
SELECT NAME, DEPT_ID, SALARY, AGE, AVG(SALARY) AS AVG_WITHIN_5_YEAR
    OVER(PARTITION BY DEPT_ID 
         ORDER BY AGE 
         RANGE BETWEEN 5 PRECEDING AND 5 FOLLOWING 
         EXCLUDE CURRENT ROW) 
FROM EMPLOYEE

```

- Let's say you want to compare every customer's yearly purchase with other customers' 

average yearly purchase who are at different age group from the current customer. The 

query could be:

```SQL
SELECT CUST_NAME, AGE, PROD_CATEGORY, YEARLY_PURCHASE, AVG(YEARLY_PURCHASE) 
    OVER(PARTITION BY PROD_CATEGORY 
         ORDER BY AGE 
         RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUND FOLLOWING 
         EXCLUDE GROUP) 
FROM CUSTOMER_PURCHASE_SUM

```)

> Support EXCLUDE clause in Window function framing
> -------------------------------------------------
>
>                 Key: SPARK-14361
>                 URL: https://issues.apache.org/jira/browse/SPARK-14361
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Xin Wu
>
> The current Spark SQL does not support the {code}exclude{code} clause in Window function framing clause, which is part of ANSI SQL2003's 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org