You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Dave Houston <ro...@crankyadmin.net> on 2012/01/11 15:54:08 UTC

Lag & Lead

Hi guys,

trying to calculate the dwell time of pages in a weblog. In oracle we would used the lead analytic function to find the next row for a particular cookie. What is the best approach for Hive?

Thanks 

Dave

Dave Houston
root@crankyadmin.net




Re: Lag & Lead

Posted by Mark Grover <mg...@oanda.com>.
Dave,
I had a similar need for the "first" function but since the Hive ticket Ed mentioned is still unresolved, I ended up writing a reducer (pluggable into Hive via the "transform" functionality) that returned the first row. In your example, you would "distribute by" the cookie before you send the data to the reducer.

You could look into doing something similar as well. Perhaps, a nicer way would be to write a UDAF but the reducer works fine for me.

Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgrover@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 

----- Original Message -----
From: "Edward Capriolo" <ed...@gmail.com>
To: user@hive.apache.org
Sent: Wednesday, January 11, 2012 12:02:08 PM
Subject: Re: Lag & Lead


See this for discussion. 



https://issues.apache.org/jira/browse/HIVE-896 

On Wed, Jan 11, 2012 at 9:54 AM, Dave Houston < root@crankyadmin.net > wrote: 



Hi guys, 

trying to calculate the dwell time of pages in a weblog. In oracle we would used the lead analytic function to find the next row for a particular cookie. What is the best approach for Hive? 

Thanks 

Dave 



Dave Houston 
root@crankyadmin.net 





Re: Lag & Lead

Posted by Edward Capriolo <ed...@gmail.com>.
See this for discussion.

https://issues.apache.org/jira/browse/HIVE-896

On Wed, Jan 11, 2012 at 9:54 AM, Dave Houston <ro...@crankyadmin.net> wrote:

> Hi guys,
>
> trying to calculate the dwell time of pages in a weblog. In oracle we
> would used the lead analytic function to find the next row for a particular
> cookie. What is the best approach for Hive?
>
> Thanks
>
> Dave
>
> Dave Houston
> root@crankyadmin.net
>
>
>
>