You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jeff Zhang <zj...@gmail.com> on 2010/06/11 11:24:00 UTC

Is anybody working on the globally "order by" of hive ?

Hi all,

>From the wiki of hive, Hive do not have the feature of globally "order
by", the sort by of hive is for each reducer. Our team think the
globally "order by" is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


-- 
Best Regards

Jeff Zhang

Re: Is anybody working on the globally "order by" of hive ?

Posted by Jeff Zhang <zj...@gmail.com>.
Great, I can work on this issue.




On Sat, Jun 12, 2010 at 2:02 PM, Jeff Hammerbacher <ha...@cloudera.com> wrote:
> See https://issues.apache.org/jira/browse/HIVE-1402.
>
> On Fri, Jun 11, 2010 at 1:22 PM, John Sichi <js...@facebook.com> wrote:
>
>> If someone is interested in adding parallel ORDER BY to Hive (using
>> TotalOrderPartitioner), here's a good starting point:
>>
>> http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad
>>
>> The goal would be to take that manual two-step sample-then-sort process and
>> turn it into an automatic plan within Hive.  I have a better example for the
>> sampling query which I haven't published yet.
>>
>> We would also need to name the final output files in such a way that the
>> total order could be iterated via the filenames.
>>
>



-- 
Best Regards

Jeff Zhang

Re: Is anybody working on the globally "order by" of hive ?

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
See https://issues.apache.org/jira/browse/HIVE-1402.

On Fri, Jun 11, 2010 at 1:22 PM, John Sichi <js...@facebook.com> wrote:

> If someone is interested in adding parallel ORDER BY to Hive (using
> TotalOrderPartitioner), here's a good starting point:
>
> http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad
>
> The goal would be to take that manual two-step sample-then-sort process and
> turn it into an automatic plan within Hive.  I have a better example for the
> sampling query which I haven't published yet.
>
> We would also need to name the final output files in such a way that the
> total order could be iterated via the filenames.
>

Re: Is anybody working on the globally "order by" of hive ?

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
See https://issues.apache.org/jira/browse/HIVE-1402.

On Fri, Jun 11, 2010 at 1:22 PM, John Sichi <js...@facebook.com> wrote:

> If someone is interested in adding parallel ORDER BY to Hive (using
> TotalOrderPartitioner), here's a good starting point:
>
> http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad
>
> The goal would be to take that manual two-step sample-then-sort process and
> turn it into an automatic plan within Hive.  I have a better example for the
> sampling query which I haven't published yet.
>
> We would also need to name the final output files in such a way that the
> total order could be iterated via the filenames.
>

RE: Is anybody working on the globally "order by" of hive ?

Posted by John Sichi <js...@facebook.com>.
If someone is interested in adding parallel ORDER BY to Hive (using TotalOrderPartitioner), here's a good starting point:

http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad

The goal would be to take that manual two-step sample-then-sort process and turn it into an automatic plan within Hive.  I have a better example for the sampling query which I haven't published yet.

We would also need to name the final output files in such a way that the total order could be iterated via the filenames.

JVS

________________________________________
From: Ning Zhang [nzhang@facebook.com]
Sent: Friday, June 11, 2010 12:40 PM
To: 'hive-user@hadoop.apache.org'
Cc: 'hive-dev@hadoop.apache.org'
Subject: Re: Is anybody working on the globally "order by" of hive ?

Good idea Edward. It would definitely better if it is what it sounds to be.

Btw Jeff, order by is supported in trunk with certain limititions in strict mode (has to have a limit). I will be able to update the wiki when I come back.

Thanks,
Ning
------
Sent from my blackberry

________________________________
From: Edward Capriolo <ed...@gmail.com>
To: hive-user@hadoop.apache.org <hi...@hadoop.apache.org>
Cc: hive-dev@hadoop.apache.org <hi...@hadoop.apache.org>
Sent: Fri Jun 11 11:13:57 2010
Subject: Re: Is anybody working on the globally "order by" of hive ?


On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang <zj...@gmail.com>> wrote:
Hi all,

>From the wiki of hive, Hive do not have the feature of globally "order
by", the sort by of hive is for each reducer. Our team think the
globally "order by" is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


--
Best Regards

Jeff Zhang

Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in this. As of now order by sets reduce tasks to 1 :)

Edward

RE: Is anybody working on the globally "order by" of hive ?

Posted by John Sichi <js...@facebook.com>.
If someone is interested in adding parallel ORDER BY to Hive (using TotalOrderPartitioner), here's a good starting point:

http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad

The goal would be to take that manual two-step sample-then-sort process and turn it into an automatic plan within Hive.  I have a better example for the sampling query which I haven't published yet.

We would also need to name the final output files in such a way that the total order could be iterated via the filenames.

JVS

________________________________________
From: Ning Zhang [nzhang@facebook.com]
Sent: Friday, June 11, 2010 12:40 PM
To: 'hive-user@hadoop.apache.org'
Cc: 'hive-dev@hadoop.apache.org'
Subject: Re: Is anybody working on the globally "order by" of hive ?

Good idea Edward. It would definitely better if it is what it sounds to be.

Btw Jeff, order by is supported in trunk with certain limititions in strict mode (has to have a limit). I will be able to update the wiki when I come back.

Thanks,
Ning
------
Sent from my blackberry

________________________________
From: Edward Capriolo <ed...@gmail.com>
To: hive-user@hadoop.apache.org <hi...@hadoop.apache.org>
Cc: hive-dev@hadoop.apache.org <hi...@hadoop.apache.org>
Sent: Fri Jun 11 11:13:57 2010
Subject: Re: Is anybody working on the globally "order by" of hive ?


On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang <zj...@gmail.com>> wrote:
Hi all,

>From the wiki of hive, Hive do not have the feature of globally "order
by", the sort by of hive is for each reducer. Our team think the
globally "order by" is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


--
Best Regards

Jeff Zhang

Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in this. As of now order by sets reduce tasks to 1 :)

Edward

Re: Is anybody working on the globally "order by" of hive ?

Posted by Ning Zhang <nz...@facebook.com>.
Good idea Edward. It would definitely better if it is what it sounds to be.

Btw Jeff, order by is supported in trunk with certain limititions in strict mode (has to have a limit). I will be able to update the wiki when I come back.

Thanks,
Ning
------
Sent from my blackberry

________________________________
From: Edward Capriolo <ed...@gmail.com>
To: hive-user@hadoop.apache.org <hi...@hadoop.apache.org>
Cc: hive-dev@hadoop.apache.org <hi...@hadoop.apache.org>
Sent: Fri Jun 11 11:13:57 2010
Subject: Re: Is anybody working on the globally "order by" of hive ?


On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang <zj...@gmail.com>> wrote:
Hi all,

From the wiki of hive, Hive do not have the feature of globally "order
by", the sort by of hive is for each reducer. Our team think the
globally "order by" is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


--
Best Regards

Jeff Zhang

Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in this. As of now order by sets reduce tasks to 1 :)

Edward

Re: Is anybody working on the globally "order by" of hive ?

Posted by Ning Zhang <nz...@facebook.com>.
Good idea Edward. It would definitely better if it is what it sounds to be.

Btw Jeff, order by is supported in trunk with certain limititions in strict mode (has to have a limit). I will be able to update the wiki when I come back.

Thanks,
Ning
------
Sent from my blackberry

________________________________
From: Edward Capriolo <ed...@gmail.com>
To: hive-user@hadoop.apache.org <hi...@hadoop.apache.org>
Cc: hive-dev@hadoop.apache.org <hi...@hadoop.apache.org>
Sent: Fri Jun 11 11:13:57 2010
Subject: Re: Is anybody working on the globally "order by" of hive ?


On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang <zj...@gmail.com>> wrote:
Hi all,

From the wiki of hive, Hive do not have the feature of globally "order
by", the sort by of hive is for each reducer. Our team think the
globally "order by" is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


--
Best Regards

Jeff Zhang

Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in this. As of now order by sets reduce tasks to 1 :)

Edward

Re: Is anybody working on the globally "order by" of hive ?

Posted by Edward Capriolo <ed...@gmail.com>.
On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi all,
>
> From the wiki of hive, Hive do not have the feature of globally "order
> by", the sort by of hive is for each reducer. Our team think the
> globally "order by" is an important feature for users, so wondering is
> anybody working it ? I am very interested to been involved.
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in
this. As of now order by sets reduce tasks to 1 :)

Edward

Re: Is anybody working on the globally "order by" of hive ?

Posted by Edward Capriolo <ed...@gmail.com>.
On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi all,
>
> From the wiki of hive, Hive do not have the feature of globally "order
> by", the sort by of hive is for each reducer. Our team think the
> globally "order by" is an important feature for users, so wondering is
> anybody working it ? I am very interested to been involved.
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in
this. As of now order by sets reduce tasks to 1 :)

Edward