You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by kumar mr <ku...@aol.com> on 2013/02/21 08:33:16 UTC

ROW_NUMBER() equivalent in Hive

Hi,


This is Kumar, and this is my first question in this group.


I have a requirement to implement ROW_NUMBER() from Teradata in Hive where partitioning happens on multiple columns along with multiple column ordering. 
It can be easily implemented in Hadoop MR, but I have to do in Hive. By doing in UDF can assign same rank to grouping key considering dataset is small, but ordering need to be done in prior step.
Can we do this in lot simpler way? 


Thanks in advance.


Regards,
Kumar

Re: ROW_NUMBER() equivalent in Hive

Posted by Ashutosh Chauhan <ha...@apache.org>.

Hi Stephen,

As I indicated in my previous email, check out file  ql/src/test/queries/
clientpositive/ptf_general_queries.q it has plenty of example queries
demonstrating the functionality which is available. If you are interested
in hive src changes which has enabled this feature.. you may want to start
by looking at a patch attached on HIVE-896 which was the starting point for
this work. That jira also has links with other jira which we did /are doing
on top of that patch.

Hope it helps,
Ashutosh

On Thu, Feb 21, 2013 at 12:17 PM, Stephen Boesch <ja...@gmail.com> wrote:

> Hi Ashutosh,
>    I am interested / reviewing your windowing feature.  Can you be more
> specific about which (a) tests and (b) src files constitute your additions
> (there are lots of files there ;)  )
>
> thanks
>
> stephen boesch
>
>
> 2013/2/21 Ashutosh Chauhan <ha...@apache.org>
>
>> Kumar,
>>
>> If you are willing to be on bleeding edge, this and many other
>> partitioning and windowing functionality some of us are developing in a
>> branch over at:
>> https://svn.apache.org/repos/asf/hive/branches/ptf-windowing
>> Check out this branch, build hive and than you can have row_number()
>> functionality. Look in
>> ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60
>> or so example queries demonstrating various capabilities which we have
>> already working (including row_number).
>> We hope to have this branch merged in trunk soon.
>>
>> Hope it helps,
>> Ashutosh
>> On Wed, Feb 20, 2013 at 11:33 PM, kumar mr <ku...@aol.com> wrote:
>>
>>> Hi,
>>>
>>>  This is Kumar, and this is my first question in this group.
>>>
>>>  I have a requirement to implement ROW_NUMBER() from Teradata in Hive
>>> where partitioning happens on multiple columns along with multiple column
>>> ordering.
>>> It can be easily implemented in Hadoop MR, but I have to do in Hive. By
>>> doing in UDF can assign same rank to grouping key considering dataset is
>>> small, but ordering need to be done in prior step.
>>> Can we do this in lot simpler way?
>>>
>>>  Thanks in advance.
>>>
>>>  Regards,
>>> Kumar
>>>
>>
>>
>

Re: ROW_NUMBER() equivalent in Hive

Posted by Stephen Boesch <ja...@gmail.com>.

Hi Ashutosh,
   I am interested / reviewing your windowing feature.  Can you be more
specific about which (a) tests and (b) src files constitute your additions
(there are lots of files there ;)  )

thanks

stephen boesch


2013/2/21 Ashutosh Chauhan <ha...@apache.org>

> Kumar,
>
> If you are willing to be on bleeding edge, this and many other
> partitioning and windowing functionality some of us are developing in a
> branch over at:
> https://svn.apache.org/repos/asf/hive/branches/ptf-windowing
> Check out this branch, build hive and than you can have row_number()
> functionality. Look in
> ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60
> or so example queries demonstrating various capabilities which we have
> already working (including row_number).
> We hope to have this branch merged in trunk soon.
>
> Hope it helps,
> Ashutosh
> On Wed, Feb 20, 2013 at 11:33 PM, kumar mr <ku...@aol.com> wrote:
>
>> Hi,
>>
>>  This is Kumar, and this is my first question in this group.
>>
>>  I have a requirement to implement ROW_NUMBER() from Teradata in Hive
>> where partitioning happens on multiple columns along with multiple column
>> ordering.
>> It can be easily implemented in Hadoop MR, but I have to do in Hive. By
>> doing in UDF can assign same rank to grouping key considering dataset is
>> small, but ordering need to be done in prior step.
>> Can we do this in lot simpler way?
>>
>>  Thanks in advance.
>>
>>  Regards,
>> Kumar
>>
>
>

Re: ROW_NUMBER() equivalent in Hive

Posted by Ashutosh Chauhan <ha...@apache.org>.

Kumar,

If you are willing to be on bleeding edge, this and many other partitioning
and windowing functionality some of us are developing in a branch over at:
https://svn.apache.org/repos/asf/hive/branches/ptf-windowing
Check out this branch, build hive and than you can have row_number()
functionality. Look in
ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60
or so example queries demonstrating various capabilities which we have
already working (including row_number).
We hope to have this branch merged in trunk soon.

Hope it helps,
Ashutosh
On Wed, Feb 20, 2013 at 11:33 PM, kumar mr <ku...@aol.com> wrote:

> Hi,
>
>  This is Kumar, and this is my first question in this group.
>
>  I have a requirement to implement ROW_NUMBER() from Teradata in Hive
> where partitioning happens on multiple columns along with multiple column
> ordering.
> It can be easily implemented in Hadoop MR, but I have to do in Hive. By
> doing in UDF can assign same rank to grouping key considering dataset is
> small, but ordering need to be done in prior step.
> Can we do this in lot simpler way?
>
>  Thanks in advance.
>
>  Regards,
> Kumar
>

Re: ROW_NUMBER() equivalent in Hive

Posted by Owen O'Malley <om...@apache.org>.

What are the semantics for ROW_NUMBER? Is it a global row number? Per a
partition? Per a bucket?

-- Owen


On Wed, Feb 20, 2013 at 11:33 PM, kumar mr <ku...@aol.com> wrote:

> Hi,
>
>  This is Kumar, and this is my first question in this group.
>
>  I have a requirement to implement ROW_NUMBER() from Teradata in Hive
> where partitioning happens on multiple columns along with multiple column
> ordering.
> It can be easily implemented in Hadoop MR, but I have to do in Hive. By
> doing in UDF can assign same rank to grouping key considering dataset is
> small, but ordering need to be done in prior step.
> Can we do this in lot simpler way?
>
>  Thanks in advance.
>
>  Regards,
> Kumar
>