You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Aayush Garg <aa...@gmail.com> on 2010/03/03 18:21:39 UTC
Sorting
Hi,
Suppose I do need to sort a big file(in GB). How would I accomplish
this task using hadoop.
My main problem is how to merge the output of individual reduce phases?
thanks
Re: Sorting
Posted by Thejas Nair <te...@yahoo-inc.com>.
If you don't want to implement all that, then just use 3 lines of pig.
l = load 'file';
o = order file by $1;
store o into 'file.sorted'
-Thejas
On 3/4/10 2:17 PM, "Alex Kozlov" <al...@cloudera.com> wrote:
> Hi Aayush,
>
> In short, you write a special partitioner that partitions the data in
> non-overlapping intervals.
>
> There a few article on this with a lot more details:
>
> http://sortbenchmark.org/YahooHadoop.pdf
> http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162
> .html
>
> Alex K
>
> On Wed, Mar 3, 2010 at 9:21 AM, Aayush Garg <aa...@gmail.com> wrote:
>
>> Hi,
>>
>> Suppose I do need to sort a big file(in GB). How would I accomplish
>> this task using hadoop.
>> My main problem is how to merge the output of individual reduce phases?
>>
>> thanks
>>
Re: Sorting
Posted by Alex Kozlov <al...@cloudera.com>.
Hi Aayush,
In short, you write a special partitioner that partitions the data in
non-overlapping intervals.
There a few article on this with a lot more details:
http://sortbenchmark.org/YahooHadoop.pdf
http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html
Alex K
On Wed, Mar 3, 2010 at 9:21 AM, Aayush Garg <aa...@gmail.com> wrote:
> Hi,
>
> Suppose I do need to sort a big file(in GB). How would I accomplish
> this task using hadoop.
> My main problem is how to merge the output of individual reduce phases?
>
> thanks
>
Re: Sorting
Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Sample your input data and use the sample to drive your partitioner.
Please take a look at TeraSort example in
org.apache.hadoop.examples.terasort.
Arun
On Mar 3, 2010, at 9:21 AM, Aayush Garg wrote:
> Hi,
>
> Suppose I do need to sort a big file(in GB). How would I accomplish
> this task using hadoop.
> My main problem is how to merge the output of individual reduce
> phases?
>
> thanks