You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Butani, Harish" <ha...@sap.com> on 2012/01/17 07:11:08 UTC

SQL windowing functions.( Issue HIVE-896)

Hi,



I have been developing a solution that works with Hive. Here is a simple e.g.



from <select county, tract, arealand from geo_header_sf1 where sumlev = 140>

partition by county

order by county, arealand desc

with

  rank() as r,

  sum(arealand) over rows between unbounded preceding and current row as cum_area

select county, tract, arealand, r, cum_area where <r <= 3>

into path='/tmp/wout'



We have loaded the Census data into our Hive instance. The Query is calculating the Top 3 Tracts(based on land area) by County.



Windowing Queries can be expressed on any Hive Query or Table or on an HDFS file. Windowing processing works in MR mode

or in Hive mode. MR mode generates a MR job to evaluate the windowing functions. In Hive Mode, the Windowing process is

invoked from the Hive Script Operator. Support for 16 functions divided into: Ranking, Aggregation and navigation.

Support for both value and range boundaries for Windows. Easy to extend through groovy integration. Code is available

at https://github.com/hbutani/SQLWindowing. A writeup is available at https://github.com/hbutani/SQLWindowing/blob/master/docs/Windowing.pdf?raw=true



Please send me an email if you are interested in using this.



regards,

Harish Butani