You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Cheolsoo Park <ch...@cloudera.com> on 2012/09/23 22:43:56 UTC

e2e tests for Rank function

Hello,

The e2e tests for Rank function in trunk do not pass for me when running in
local mode. I am wondering whether they all pass for everyone.

What I am doing is as following:

ant clean
 ant -Dhadoopversion=20 ... test-e2e-deploy-local
ant -Dhadoopversion=20 ... test-e2e-local -Dtests.to.run="-t Rank"

All tests except Rank_4 fail with errors similar to this:

java.io.IOException: Illegal partition for Null: false index: 0 (1,7) (1)
    at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
    at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
    at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

I wanted to double check whether I am doing something wrong before I open a
jira.

Thanks,
Cheolsoo

Re: e2e tests for Rank function

Posted by Gianmarco De Francisci Morales <gd...@apache.org>.
Hi,

Weird, they should be passing.
I will double check them tomorrow.

Cheers,
--
Gianmarco



On Sun, Sep 23, 2012 at 10:43 PM, Cheolsoo Park <ch...@cloudera.com>wrote:

> Hello,
>
> The e2e tests for Rank function in trunk do not pass for me when running in
> local mode. I am wondering whether they all pass for everyone.
>
> What I am doing is as following:
>
> ant clean
>  ant -Dhadoopversion=20 ... test-e2e-deploy-local
> ant -Dhadoopversion=20 ... test-e2e-local -Dtests.to.run="-t Rank"
>
> All tests except Rank_4 fail with errors similar to this:
>
> java.io.IOException: Illegal partition for Null: false index: 0 (1,7) (1)
>     at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
>     at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
>     at
>
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>     at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
>     at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
>     at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>     at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>
> I wanted to double check whether I am doing something wrong before I open a
> jira.
>
> Thanks,
> Cheolsoo
>

Re: e2e tests for Rank function

Posted by Gianmarco De Francisci Morales <gd...@apache.org>.
I was able to reproduce the bug, I opened PIG-2932 to track it.

Cheers,
--
Gianmarco



On Wed, Sep 26, 2012 at 12:07 PM, Gianmarco De Francisci Morales <
gdfm@apache.org> wrote:

> Forwarding to pig-dev.
>
> Summary, it looks like we have a regression in trunk.
> We need to investigate it before branching 0.11
>
> Cheers,
> --
> Gianmarco
>
>
>
> ---------- Forwarded message ----------
> From: Allan <aa...@gmail.com>
> Date: Wed, Sep 26, 2012 at 11:21 AM
> Subject: Re: e2e tests for Rank function
> To: cheolsoo <ch...@cloudera.com>, Gianmarco De Francisci Morales <
> gdfm@apache.org>
>
>
> Hi Cheolsoo and Gianmarco,
>
> I double check the e2e tests, and I reproduced the scenario and it's
> correct...it's failing.
>
> Then, looking for a possible reason, I tried the following script:
>
> SET default_parallel 9;
> A = LOAD 'prerank' using PigStorage(',') as
> (rownumber:long,rankcabd:long,rankbdaa:long,rankbdca:long,rankaacd:long,rankaaba:long,a:int,b:int,c:int,tail:bytearray);
> B = group A by (a, b);
> C = foreach B generate flatten(group),A;
> D = order C by group::a ASC, group::b ASC;
>
>
> And it fails, with the same exception' message.
>
> Then, I tried the same script, but omitting the "SET default_parallel 9;"
> and it works. So, I'm really surprised that on local mode it doesn't work
> with parallelism.
>
> The reason for using this script is because RANK (RANK BY) operator uses
> the same chain of operators: GROUP (B), a flatten (C), SORT (D).
>
> Best regards,
>
> On Sun, Sep 23, 2012 at 10:43 PM, Cheolsoo Park <ch...@cloudera.com>wrote:
>
>> Hello,
>>
>> The e2e tests for Rank function in trunk do not pass for me when running
>> in
>> local mode. I am wondering whether they all pass for everyone.
>>
>> What I am doing is as following:
>>
>> ant clean
>>  ant -Dhadoopversion=20 ... test-e2e-deploy-local
>> ant -Dhadoopversion=20 ... test-e2e-local -Dtests.to.run="-t Rank"
>>
>> All tests except Rank_4 fail with errors similar to this:
>>
>> java.io.IOException: Illegal partition for Null: false index: 0 (1,7) (1)
>>     at
>>
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
>>     at
>>
>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
>>     at
>>
>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>>     at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
>>     at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
>>     at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>>     at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>     at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>>
>> I wanted to double check whether I am doing something wrong before I open
>> a
>> jira.
>>
>> Thanks,
>> Cheolsoo
>>
>
>
>
> --
>
> Allan AvendaƱo S.
> Computer Engineer
> SWY22 Participant
> GSOC 2012 Participant
> Rome - Italy
> Gmail: aavendan@gmail.com
> --
>
>
>

Fwd: e2e tests for Rank function

Posted by Gianmarco De Francisci Morales <gd...@apache.org>.
Forwarding to pig-dev.

Summary, it looks like we have a regression in trunk.
We need to investigate it before branching 0.11

Cheers,
--
Gianmarco



---------- Forwarded message ----------
From: Allan <aa...@gmail.com>
Date: Wed, Sep 26, 2012 at 11:21 AM
Subject: Re: e2e tests for Rank function
To: cheolsoo <ch...@cloudera.com>, Gianmarco De Francisci Morales <
gdfm@apache.org>


Hi Cheolsoo and Gianmarco,

I double check the e2e tests, and I reproduced the scenario and it's
correct...it's failing.

Then, looking for a possible reason, I tried the following script:

SET default_parallel 9;
A = LOAD 'prerank' using PigStorage(',') as
(rownumber:long,rankcabd:long,rankbdaa:long,rankbdca:long,rankaacd:long,rankaaba:long,a:int,b:int,c:int,tail:bytearray);
B = group A by (a, b);
C = foreach B generate flatten(group),A;
D = order C by group::a ASC, group::b ASC;


And it fails, with the same exception' message.

Then, I tried the same script, but omitting the "SET default_parallel 9;"
and it works. So, I'm really surprised that on local mode it doesn't work
with parallelism.

The reason for using this script is because RANK (RANK BY) operator uses
the same chain of operators: GROUP (B), a flatten (C), SORT (D).

Best regards,

On Sun, Sep 23, 2012 at 10:43 PM, Cheolsoo Park <ch...@cloudera.com>wrote:

> Hello,
>
> The e2e tests for Rank function in trunk do not pass for me when running in
> local mode. I am wondering whether they all pass for everyone.
>
> What I am doing is as following:
>
> ant clean
>  ant -Dhadoopversion=20 ... test-e2e-deploy-local
> ant -Dhadoopversion=20 ... test-e2e-local -Dtests.to.run="-t Rank"
>
> All tests except Rank_4 fail with errors similar to this:
>
> java.io.IOException: Illegal partition for Null: false index: 0 (1,7) (1)
>     at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
>     at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
>     at
>
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>     at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
>     at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
>     at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>     at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>
> I wanted to double check whether I am doing something wrong before I open a
> jira.
>
> Thanks,
> Cheolsoo
>



-- 

Allan AvendaƱo S.
Computer Engineer
SWY22 Participant
GSOC 2012 Participant
Rome - Italy
Gmail: aavendan@gmail.com
--