You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org> on 2012/09/26 15:43:07 UTC
[jira] [Created] (PIG-2932) Setting high default_parallel causes
IOException in local mode
Gianmarco De Francisci Morales created PIG-2932:
---------------------------------------------------
Summary: Setting high default_parallel causes IOException in local mode
Key: PIG-2932
URL: https://issues.apache.org/jira/browse/PIG-2932
Project: Pig
Issue Type: Bug
Reporter: Gianmarco De Francisci Morales
Priority: Critical
This bug has been confirmed only in local mode.
When setting a high default_parallel, Pig fails on some operations.
The following data and script reproduce the bug.
Data:
{code}
grunt> cat file.txt
11 1 qwer
12 2 qwerty
13 3 ert
13 3 ertyu
14 4 zxcv
16 6 fsdfg
16 6 fdfghj
18 8 fjklopi
{code}
Script:
{code}
SET default_parallel 9
a = load 'file.txt' as (id1:int, id2:int, str:chararray);
b = group a by (id1,id2);
c = foreach b generate flatten(group), a;
d = order c by group::id1 ASC, group::id2 ASC;
dump d
{code}
Error:
{code}
2012-09-26 15:28:13,230 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C: R:
2012-09-26 15:28:13,232 [Thread-32] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
{code}
The script succeeds if default_parallel is set to 2.
I guess it depends on the fact that the default_parallel is higher than the number of unique keys, probably some quirk with ORDER BY.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2932) Setting high default_parallel causes
IOException in local mode
Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464942#comment-13464942 ]
Cheolsoo Park commented on PIG-2932:
------------------------------------
Hi Allan,
That's expected. Any parallelism in local mode w/o MAPREDUCE-1367 will fail. If you compare LocalJobRunner of hadoop 1.0.0 and that of hadoop 2.0.0, you will see that MAPREDUCE-1367 is included in 2.0.0 but not in 1.0.0.
Again, you code runs fine in MR2 (hadoop-2.0.0). To really "fix" it, we have to replace the hadoop dependency in pig.jar, but all the hadoop 0.20.x and 1.0.0 have the same problem.
In fact, I ran into the same issue at PIG-2852, and I documented it in Pig manual.
I suggest that we should change the title of the jira and disable Rank tests in local mode. Please let me know if you have a better suggestion.
Thanks!
> Setting high default_parallel causes IOException in local mode
> --------------------------------------------------------------
>
> Key: PIG-2932
> URL: https://issues.apache.org/jira/browse/PIG-2932
> Project: Pig
> Issue Type: Bug
> Reporter: Gianmarco De Francisci Morales
> Priority: Critical
> Attachments: PIG-2932.patch
>
>
> This bug has been confirmed only in local mode.
> When setting a high default_parallel, Pig fails on some operations.
> The following data and script reproduce the bug.
> Data:
> {code}
> grunt> cat file.txt
> 11 1 qwer
> 12 2 qwerty
> 13 3 ert
> 13 3 ertyu
> 14 4 zxcv
> 16 6 fsdfg
> 16 6 fdfghj
> 18 8 fjklopi
> {code}
> Script:
> {code}
> SET default_parallel 9
> a = load 'file.txt' as (id1:int, id2:int, str:chararray);
> b = group a by (id1,id2);
> c = foreach b generate flatten(group), a;
> d = order c by group::id1 ASC, group::id2 ASC;
> dump d
> {code}
> Error:
> {code}
> 2012-09-26 15:28:13,230 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C: R:
> 2012-09-26 15:28:13,232 [Thread-32] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
> java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
> at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> {code}
> The script succeeds if default_parallel is set to 2.
> I guess it depends on the fact that the default_parallel is higher than the number of unique keys, probably some quirk with ORDER BY.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2932) Setting high default_parallel causes
IOException in local mode
Posted by "Allan Avendaño (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465483#comment-13465483 ]
Allan Avendaño commented on PIG-2932:
-------------------------------------
Thanks for the explanation Cheolsoo.
> Setting high default_parallel causes IOException in local mode
> --------------------------------------------------------------
>
> Key: PIG-2932
> URL: https://issues.apache.org/jira/browse/PIG-2932
> Project: Pig
> Issue Type: Bug
> Reporter: Gianmarco De Francisci Morales
> Priority: Critical
> Attachments: PIG-2932.patch
>
>
> This bug has been confirmed only in local mode.
> When setting a high default_parallel, Pig fails on some operations.
> The following data and script reproduce the bug.
> Data:
> {code}
> grunt> cat file.txt
> 11 1 qwer
> 12 2 qwerty
> 13 3 ert
> 13 3 ertyu
> 14 4 zxcv
> 16 6 fsdfg
> 16 6 fdfghj
> 18 8 fjklopi
> {code}
> Script:
> {code}
> SET default_parallel 9
> a = load 'file.txt' as (id1:int, id2:int, str:chararray);
> b = group a by (id1,id2);
> c = foreach b generate flatten(group), a;
> d = order c by group::id1 ASC, group::id2 ASC;
> dump d
> {code}
> Error:
> {code}
> 2012-09-26 15:28:13,230 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C: R:
> 2012-09-26 15:28:13,232 [Thread-32] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
> java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
> at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> {code}
> The script succeeds if default_parallel is set to 2.
> I guess it depends on the fact that the default_parallel is higher than the number of unique keys, probably some quirk with ORDER BY.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2932) Setting high default_parallel causes
IOException in local mode
Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-2932:
-------------------------------
Attachment: PIG-2932.patch
Attaching a patch that disables Rank tests in local mode.
> Setting high default_parallel causes IOException in local mode
> --------------------------------------------------------------
>
> Key: PIG-2932
> URL: https://issues.apache.org/jira/browse/PIG-2932
> Project: Pig
> Issue Type: Bug
> Reporter: Gianmarco De Francisci Morales
> Priority: Critical
> Attachments: PIG-2932.patch
>
>
> This bug has been confirmed only in local mode.
> When setting a high default_parallel, Pig fails on some operations.
> The following data and script reproduce the bug.
> Data:
> {code}
> grunt> cat file.txt
> 11 1 qwer
> 12 2 qwerty
> 13 3 ert
> 13 3 ertyu
> 14 4 zxcv
> 16 6 fsdfg
> 16 6 fdfghj
> 18 8 fjklopi
> {code}
> Script:
> {code}
> SET default_parallel 9
> a = load 'file.txt' as (id1:int, id2:int, str:chararray);
> b = group a by (id1,id2);
> c = foreach b generate flatten(group), a;
> d = order c by group::id1 ASC, group::id2 ASC;
> dump d
> {code}
> Error:
> {code}
> 2012-09-26 15:28:13,230 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C: R:
> 2012-09-26 15:28:13,232 [Thread-32] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
> java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
> at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> {code}
> The script succeeds if default_parallel is set to 2.
> I guess it depends on the fact that the default_parallel is higher than the number of unique keys, probably some quirk with ORDER BY.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2932) Setting high default_parallel causes
IOException in local mode
Posted by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465232#comment-13465232 ]
Gianmarco De Francisci Morales commented on PIG-2932:
-----------------------------------------------------
Cheolsoo, thanks for the explanation.
Now it is more clear.
I agree with your proposals.
Will test and commit the patch tomorrow.
> Setting high default_parallel causes IOException in local mode
> --------------------------------------------------------------
>
> Key: PIG-2932
> URL: https://issues.apache.org/jira/browse/PIG-2932
> Project: Pig
> Issue Type: Bug
> Reporter: Gianmarco De Francisci Morales
> Priority: Critical
> Attachments: PIG-2932.patch
>
>
> This bug has been confirmed only in local mode.
> When setting a high default_parallel, Pig fails on some operations.
> The following data and script reproduce the bug.
> Data:
> {code}
> grunt> cat file.txt
> 11 1 qwer
> 12 2 qwerty
> 13 3 ert
> 13 3 ertyu
> 14 4 zxcv
> 16 6 fsdfg
> 16 6 fdfghj
> 18 8 fjklopi
> {code}
> Script:
> {code}
> SET default_parallel 9
> a = load 'file.txt' as (id1:int, id2:int, str:chararray);
> b = group a by (id1,id2);
> c = foreach b generate flatten(group), a;
> d = order c by group::id1 ASC, group::id2 ASC;
> dump d
> {code}
> Error:
> {code}
> 2012-09-26 15:28:13,230 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C: R:
> 2012-09-26 15:28:13,232 [Thread-32] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
> java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
> at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> {code}
> The script succeeds if default_parallel is set to 2.
> I guess it depends on the fact that the default_parallel is higher than the number of unique keys, probably some quirk with ORDER BY.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2932) Setting high default_parallel causes
IOException in local mode
Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464380#comment-13464380 ]
Cheolsoo Park commented on PIG-2932:
------------------------------------
In fact, the example script works fine with MR2 in local mode, so that makes me believe that this is an issue with LocalJobRunner of hadoop 0.20. Please see MAPREDUCE-1367.
The best fix is probably disabling the tests in local mode. I don't think that there is a way to disable tests only when hadoopversion == 20 && execonly == local.
Thanks!
> Setting high default_parallel causes IOException in local mode
> --------------------------------------------------------------
>
> Key: PIG-2932
> URL: https://issues.apache.org/jira/browse/PIG-2932
> Project: Pig
> Issue Type: Bug
> Reporter: Gianmarco De Francisci Morales
> Priority: Critical
>
> This bug has been confirmed only in local mode.
> When setting a high default_parallel, Pig fails on some operations.
> The following data and script reproduce the bug.
> Data:
> {code}
> grunt> cat file.txt
> 11 1 qwer
> 12 2 qwerty
> 13 3 ert
> 13 3 ertyu
> 14 4 zxcv
> 16 6 fsdfg
> 16 6 fdfghj
> 18 8 fjklopi
> {code}
> Script:
> {code}
> SET default_parallel 9
> a = load 'file.txt' as (id1:int, id2:int, str:chararray);
> b = group a by (id1,id2);
> c = foreach b generate flatten(group), a;
> d = order c by group::id1 ASC, group::id2 ASC;
> dump d
> {code}
> Error:
> {code}
> 2012-09-26 15:28:13,230 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C: R:
> 2012-09-26 15:28:13,232 [Thread-32] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
> java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
> at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> {code}
> The script succeeds if default_parallel is set to 2.
> I guess it depends on the fact that the default_parallel is higher than the number of unique keys, probably some quirk with ORDER BY.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2932) Setting high default_parallel causes
IOException in local mode
Posted by "Allan Avendaño (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464628#comment-13464628 ]
Allan Avendaño commented on PIG-2932:
-------------------------------------
I think the patch can solve the problem with rank tests execution in local mode.
But, I tried this code and failed again:
{code}
SET default_parallel 3;
a = load 'file.txt' as (id1:int, id2:int, str:chararray);
b = order id1 ASC, id2 ASC;
dump b;
{code}
I'm using Hadoop 1.0.0
> Setting high default_parallel causes IOException in local mode
> --------------------------------------------------------------
>
> Key: PIG-2932
> URL: https://issues.apache.org/jira/browse/PIG-2932
> Project: Pig
> Issue Type: Bug
> Reporter: Gianmarco De Francisci Morales
> Priority: Critical
> Attachments: PIG-2932.patch
>
>
> This bug has been confirmed only in local mode.
> When setting a high default_parallel, Pig fails on some operations.
> The following data and script reproduce the bug.
> Data:
> {code}
> grunt> cat file.txt
> 11 1 qwer
> 12 2 qwerty
> 13 3 ert
> 13 3 ertyu
> 14 4 zxcv
> 16 6 fsdfg
> 16 6 fdfghj
> 18 8 fjklopi
> {code}
> Script:
> {code}
> SET default_parallel 9
> a = load 'file.txt' as (id1:int, id2:int, str:chararray);
> b = group a by (id1,id2);
> c = foreach b generate flatten(group), a;
> d = order c by group::id1 ASC, group::id2 ASC;
> dump d
> {code}
> Error:
> {code}
> 2012-09-26 15:28:13,230 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C: R:
> 2012-09-26 15:28:13,232 [Thread-32] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
> java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
> at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> {code}
> The script succeeds if default_parallel is set to 2.
> I guess it depends on the fact that the default_parallel is higher than the number of unique keys, probably some quirk with ORDER BY.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2932) Setting high default_parallel causes
IOException in local mode
Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-2932:
-------------------------------
Assignee: Cheolsoo Park
Status: Patch Available (was: Open)
> Setting high default_parallel causes IOException in local mode
> --------------------------------------------------------------
>
> Key: PIG-2932
> URL: https://issues.apache.org/jira/browse/PIG-2932
> Project: Pig
> Issue Type: Bug
> Reporter: Gianmarco De Francisci Morales
> Assignee: Cheolsoo Park
> Priority: Critical
> Attachments: PIG-2932.patch
>
>
> This bug has been confirmed only in local mode.
> When setting a high default_parallel, Pig fails on some operations.
> The following data and script reproduce the bug.
> Data:
> {code}
> grunt> cat file.txt
> 11 1 qwer
> 12 2 qwerty
> 13 3 ert
> 13 3 ertyu
> 14 4 zxcv
> 16 6 fsdfg
> 16 6 fdfghj
> 18 8 fjklopi
> {code}
> Script:
> {code}
> SET default_parallel 9
> a = load 'file.txt' as (id1:int, id2:int, str:chararray);
> b = group a by (id1,id2);
> c = foreach b generate flatten(group), a;
> d = order c by group::id1 ASC, group::id2 ASC;
> dump d
> {code}
> Error:
> {code}
> 2012-09-26 15:28:13,230 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C: R:
> 2012-09-26 15:28:13,232 [Thread-32] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
> java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
> at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> {code}
> The script succeeds if default_parallel is set to 2.
> I guess it depends on the fact that the default_parallel is higher than the number of unique keys, probably some quirk with ORDER BY.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2932) Setting high default_parallel causes
IOException in local mode
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates updated PIG-2932:
----------------------------
Resolution: Fixed
Fix Version/s: 0.11
Status: Resolved (was: Patch Available)
Patch checked in. Thanks Cheolsoo.
> Setting high default_parallel causes IOException in local mode
> --------------------------------------------------------------
>
> Key: PIG-2932
> URL: https://issues.apache.org/jira/browse/PIG-2932
> Project: Pig
> Issue Type: Bug
> Reporter: Gianmarco De Francisci Morales
> Assignee: Cheolsoo Park
> Priority: Critical
> Fix For: 0.11
>
> Attachments: PIG-2932.patch
>
>
> This bug has been confirmed only in local mode.
> When setting a high default_parallel, Pig fails on some operations.
> The following data and script reproduce the bug.
> Data:
> {code}
> grunt> cat file.txt
> 11 1 qwer
> 12 2 qwerty
> 13 3 ert
> 13 3 ertyu
> 14 4 zxcv
> 16 6 fsdfg
> 16 6 fdfghj
> 18 8 fjklopi
> {code}
> Script:
> {code}
> SET default_parallel 9
> a = load 'file.txt' as (id1:int, id2:int, str:chararray);
> b = group a by (id1,id2);
> c = foreach b generate flatten(group), a;
> d = order c by group::id1 ASC, group::id2 ASC;
> dump d
> {code}
> Error:
> {code}
> 2012-09-26 15:28:13,230 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C: R:
> 2012-09-26 15:28:13,232 [Thread-32] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
> java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
> at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> {code}
> The script succeeds if default_parallel is set to 2.
> I guess it depends on the fact that the default_parallel is higher than the number of unique keys, probably some quirk with ORDER BY.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira