You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jeff Zhang <zj...@gmail.com> on 2010/05/17 09:34:57 UTC

Re: Initial Benchmark Results

Anyone still has the comparison report ? The link seems broken now.



On Tue, Jan 19, 2010 at 8:34 AM, Rob Stewart
<ro...@googlemail.com> wrote:
> Hi folks,
>
> I have some initial results to run through with you. I have a number of
> implementations ready to push onto the Hadoop cluster, but I have finalized
> the tests for Hive, JAQL and Pig for the simple WordCount application.
>
> The experiments were performance over a 32 node cluster (31 datanodes). The
> input data was generated from the Pig DataGenerator (thanks goes to Alan
> Gates).
>
> The reason for asking you all at this early stage is because Pig seems to
> fall behind some way when it comes to overal execution time. The experiment
> is a scale-up scenario, where I have fixed the nodes at 31, and increased
> the data to be processed. I have not yet done a scale-out experiment, but
> that is my next task (i.e. 10million data rows on 10 DataNodes should
> execute only slightly quicker than 20million rows on 20 DataNodes).
>
> Anyway, let me point you to the test results:
> http://www.macs.hw.ac.uk/~rs46/WordCount_Scale_Up_Execution.pdf
>
> On that PDF I include the Pig script I've used, and why I've stated the
> parallelism used. I have not yet executed the Java MapReduce wordcount, but
> imagine it will undercut the hive performance.
>
>
> Does anyone have any comments on the results, and reason why Pig performs
> poorly, or any obvious error I have done ? That said, this is the first of a
> number of experiments, and Pig may shine (for instance, I am planning a
> skewed join benchmark, which will be interesting as all languages have an
> implementation for skewed data).
>
>
> Thanks,
>
>
> Rob Stewart
>



-- 
Best Regards

Jeff Zhang

Re: Initial Benchmark Results

Posted by Jeff Zhang <zj...@gmail.com>.
Thanks, Mads.



On Tue, May 18, 2010 at 4:48 AM, Mads Moeller <xm...@gmail.com> wrote:
> Hi Jeff,
>
> It seems to have been moved.
> http://www.macs.hw.ac.uk/~rs46/publications.html
>
>
>
> On Mon, May 17, 2010 at 12:34 AM, Jeff Zhang <zj...@gmail.com> wrote:
>> Anyone still has the comparison report ? The link seems broken now.
>>
>>
>>
>> On Tue, Jan 19, 2010 at 8:34 AM, Rob Stewart
>> <ro...@googlemail.com> wrote:
>>> Hi folks,
>>>
>>> I have some initial results to run through with you. I have a number of
>>> implementations ready to push onto the Hadoop cluster, but I have finalized
>>> the tests for Hive, JAQL and Pig for the simple WordCount application.
>>>
>>> The experiments were performance over a 32 node cluster (31 datanodes). The
>>> input data was generated from the Pig DataGenerator (thanks goes to Alan
>>> Gates).
>>>
>>> The reason for asking you all at this early stage is because Pig seems to
>>> fall behind some way when it comes to overal execution time. The experiment
>>> is a scale-up scenario, where I have fixed the nodes at 31, and increased
>>> the data to be processed. I have not yet done a scale-out experiment, but
>>> that is my next task (i.e. 10million data rows on 10 DataNodes should
>>> execute only slightly quicker than 20million rows on 20 DataNodes).
>>>
>>> Anyway, let me point you to the test results:
>>> http://www.macs.hw.ac.uk/~rs46/WordCount_Scale_Up_Execution.pdf
>>>
>>> On that PDF I include the Pig script I've used, and why I've stated the
>>> parallelism used. I have not yet executed the Java MapReduce wordcount, but
>>> imagine it will undercut the hive performance.
>>>
>>>
>>> Does anyone have any comments on the results, and reason why Pig performs
>>> poorly, or any obvious error I have done ? That said, this is the first of a
>>> number of experiments, and Pig may shine (for instance, I am planning a
>>> skewed join benchmark, which will be interesting as all languages have an
>>> implementation for skewed data).
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Rob Stewart
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>



-- 
Best Regards

Jeff Zhang

Re: Initial Benchmark Results

Posted by Mads Moeller <xm...@gmail.com>.
Hi Jeff,

It seems to have been moved.
http://www.macs.hw.ac.uk/~rs46/publications.html



On Mon, May 17, 2010 at 12:34 AM, Jeff Zhang <zj...@gmail.com> wrote:
> Anyone still has the comparison report ? The link seems broken now.
>
>
>
> On Tue, Jan 19, 2010 at 8:34 AM, Rob Stewart
> <ro...@googlemail.com> wrote:
>> Hi folks,
>>
>> I have some initial results to run through with you. I have a number of
>> implementations ready to push onto the Hadoop cluster, but I have finalized
>> the tests for Hive, JAQL and Pig for the simple WordCount application.
>>
>> The experiments were performance over a 32 node cluster (31 datanodes). The
>> input data was generated from the Pig DataGenerator (thanks goes to Alan
>> Gates).
>>
>> The reason for asking you all at this early stage is because Pig seems to
>> fall behind some way when it comes to overal execution time. The experiment
>> is a scale-up scenario, where I have fixed the nodes at 31, and increased
>> the data to be processed. I have not yet done a scale-out experiment, but
>> that is my next task (i.e. 10million data rows on 10 DataNodes should
>> execute only slightly quicker than 20million rows on 20 DataNodes).
>>
>> Anyway, let me point you to the test results:
>> http://www.macs.hw.ac.uk/~rs46/WordCount_Scale_Up_Execution.pdf
>>
>> On that PDF I include the Pig script I've used, and why I've stated the
>> parallelism used. I have not yet executed the Java MapReduce wordcount, but
>> imagine it will undercut the hive performance.
>>
>>
>> Does anyone have any comments on the results, and reason why Pig performs
>> poorly, or any obvious error I have done ? That said, this is the first of a
>> number of experiments, and Pig may shine (for instance, I am planning a
>> skewed join benchmark, which will be interesting as all languages have an
>> implementation for skewed data).
>>
>>
>> Thanks,
>>
>>
>> Rob Stewart
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>