You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Carlos Balduz (JIRA)" <ji...@apache.org> on 2015/04/23 13:00:44 UTC

[jira] [Commented] (PIG-4516) RANK in Tez generates OOM: Java heap space

    [ https://issues.apache.org/jira/browse/PIG-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508839#comment-14508839 ] 

Carlos Balduz commented on PIG-4516:
------------------------------------

This is the script I am using:

{code}
A = LOAD 'my/input/data' USING PigStorage(',', '-schema');

B = FOREACH A GENERATE CONCAT(CONCAT(x, '-'), y) AS common_column;
C = FOREACH A GENERATE z AS common_column;
C = DISTINCT C;

D = UNION ONSCHEMA B, C;

E = RANK D BY common_column;
{code}

> RANK in Tez generates OOM: Java heap space
> ------------------------------------------
>
>                 Key: PIG-4516
>                 URL: https://issues.apache.org/jira/browse/PIG-4516
>             Project: Pig
>          Issue Type: Bug
>          Components: tez
>    Affects Versions: 0.14.0
>            Reporter: Carlos Balduz
>
> Running a script with a RANK operator using Tez generates a OOM error. The script continues and ends successfully, but during the execution there are several failed tasks.
> {code}
> 2015-04-23 12:48:54,881 INFO [AsyncDispatcher event handler] history.HistoryEventHandler: [HISTORY][DAG:dag_1429175510653_0318_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=scope-75, taskAttemptId=attempt_1429175510653_0318_1_01_000111_0, startTime=1429786121169, finishTime=1429786134880, timeTaken=13711, status=FAILED, diagnostics=Error: Fatal Error cause TezChild exit.:java.lang.OutOfMemoryError: Java heap space
> 	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.<init>(DefaultSorter.java:140)
> 	at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:114)
> 	at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.initializeOutputs(PigProcessor.java:299)
> 	at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:181)
> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
> 	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
> 	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> 	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
> 	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)