You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Harish Butani (JIRA)" <ji...@apache.org> on 2012/12/23 19:48:12 UTC

[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

    [ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539072#comment-13539072 ] 

Harish Butani commented on HIVE-896:
------------------------------------

Hi,

We are posting a preliminary patch for a Partitioned Table Function mechanism and 
Windowing clause support based on this. The solution let's you invoke a 
Partitioned Table Function anywhere a Table/SubQuery can appear in HQL.
The Windowing clause support matches standard SQL as much as possible: 
ability to define windows with the Query or individual Function; ability to 
specify a range or value based window with any UDAF. But since Windowing is 
handled as a PTF invocation, all Window specification must have the same Partition 
and Order specification.

You can read about the details in a (work in progress) document 
here http://tinyurl.com/ck4nopn.  We have added a  lot of tests to show case the 
functionality. A good starting point is ptf_general_queries.q, which has 49 queries.

But let us emphasize that this is a preliminary patch. We wanted to get this out early 
to get your feedback sooner rather than later. We need to do a lot of cleanup, 
refactoring and documentation. The starting point was our SQLWindowing on top of Hive 
project; which used Hive's metadata and runtime components but had its own Query form. 
So some components still reflect the assumptions from that project. We started by 
taking all the code from that project and placing it in the ql.ptf package. 
Gradually we have dissipated the stuff under this package; but we still have some 
ways to go. For background it may help to look at our Hadoop Summit 
presentation(http://tinyurl.com/bm4qb7z).

Finally and most importantly we are not completely finished. We are missing support for 
Queries with multiple Inserts. We have to address the case of Queries with aggregations 
with no group by and with constants as columns in the Select List.  On the entire ql 
testsuite there  are still around 15 failures, because of these 2 issues.

Harish Butani, Prajakta Kalmegh
                
> Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
> ---------------------------------------------------------------
>
>                 Key: HIVE-896
>                 URL: https://issues.apache.org/jira/browse/HIVE-896
>             Project: Hive
>          Issue Type: New Feature
>          Components: OLAP, UDF
>            Reporter: Amr Awadallah
>            Priority: Minor
>
> Windowing functions are very useful for click stream processing and similar time-series/sliding-window analytics.
> More details at:
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
> -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira