You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Alex Soto <al...@envieta.com> on 2018/04/27 17:49:10 UTC

Out of memory running on Yarn

Hello,

I am using Pig version 0.17.0.  When I attempt to run my pig script from the command line on a Yarn cluster I get out of memory errors.  From the Yarn application logs, I see this stack trace:

2018-04-27 13:22:10,543 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOfRange(Arrays.java:3664)
	at java.lang.String.<init>(String.java:207)
	at java.lang.StringBuilder.toString(StringBuilder.java:407)
	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2992)
	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2817)
	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2689)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1326)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1298)
	at org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.mergeConf(ConfigurationUtil.java:70)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:185)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setUpContext(PigOutputCommitter.java:115)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:89)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.<init>(PigOutputCommitter.java:70)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:297)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:550)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:532)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1779)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:532)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:309)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1737)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1734)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1668)


Now, in trying to increase the heap size, I added this to the beginning of the script:


SET mapreduce.map.java.opts '-Xmx2048m';
SET mapreduce.reduce.java.opts '-Xmx2048m';
SET mapreduce.map.memory.mb 2536;
SET mapreduce.reduce.memory.mb 2536;

But this causes no effect, as it is being ignored.  From the Yarn logs, I see the Container being launched with 1024m heap size:

echo "Launching container"
exec /bin/bash -c "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/hadoop/lo
gs/userlogs/application_1523452171521_0223/container_1523452171521_0223_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.
root.logfile=syslog  -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/opt/hadoop/logs/userlogs/application_1523452171521_0223/container_1523452171
521_0223_01_000001/stdout 2>/opt/hadoop/logs/userlogs/application_1523452171521_0223/container_1523452171521_0223_01_000001/stderr “

I also tried setting the memory requirements with the PIG_OPTS environment variable:

export PIG_OPTS="-Dmapreduce.reduce.memory.mb=5000 -Dmapreduce.map.memory.mb=5000 -Dmapreduce.map.java.opts=-Xmx5000m” 

No matter what I do, the container is always launched with -Xmx1024m and the same OOM error occurs.
The question is, what is the proper way to specify the heap sizes for my Pig mappers and reducers?

Best regards,
Alex soto

Re: Out of memory running on Yarn

Posted by Alex Soto <al...@envieta.com>.

Hi Koji,

That did help, thank you.  Now, can I specify this in the PIG_OPTS environment variable instead of in the Pig script?

Best regards,
Alex soto




> On Apr 27, 2018, at 2:21 PM, Koji Noguchi <kn...@oath.com.INVALID> wrote:
> 
> Hi Alex,
> 
> Can you try increasing the heapsize of the ApplicationMaster?
> 
> yarn.app.mapreduce.am.resource.mb=3584
> yarn.app.mapreduce.am.command-opts=-Xmx3096m
> 
> Koji
> 
> 
> 
> On Fri, Apr 27, 2018 at 1:49 PM, Alex Soto <al...@envieta.com> wrote:
> 
>> Hello,
>> 
>> I am using Pig version 0.17.0.  When I attempt to run my pig script from
>> the command line on a Yarn cluster I get out of memory errors.  From the
>> Yarn application logs, I see this stack trace:
>> 
>> 2018-04-27 13:22:10,543 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
>> Error starting MRAppMaster
>> java.lang.OutOfMemoryError: Java heap space
>>        at java.util.Arrays.copyOfRange(Arrays.java:3664)
>>        at java.lang.String.<init>(String.java:207)
>>        at java.lang.StringBuilder.toString(StringBuilder.java:407)
>>        at org.apache.hadoop.conf.Configuration.loadResource(
>> Configuration.java:2992)
>>        at org.apache.hadoop.conf.Configuration.loadResources(
>> Configuration.java:2817)
>>        at org.apache.hadoop.conf.Configuration.getProps(
>> Configuration.java:2689)
>>        at org.apache.hadoop.conf.Configuration.set(
>> Configuration.java:1326)
>>        at org.apache.hadoop.conf.Configuration.set(
>> Configuration.java:1298)
>>        at org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.
>> mergeConf(ConfigurationUtil.java:70)
>>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
>> PigOutputFormat.setLocation(PigOutputFormat.java:185)
>>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
>> PigOutputCommitter.setUpContext(PigOutputCommitter.java:115)
>>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
>> PigOutputCommitter.getCommitters(PigOutputCommitter.java:89)
>>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
>> PigOutputCommitter.<init>(PigOutputCommitter.java:70)
>>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
>> PigOutputFormat.getOutputCommitter(PigOutputFormat.java:297)
>>        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(
>> MRAppMaster.java:550)
>>        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(
>> MRAppMaster.java:532)
>>        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
>> callWithJobClassLoader(MRAppMaster.java:1779)
>>        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
>> createOutputCommitter(MRAppMaster.java:532)
>>        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
>> serviceInit(MRAppMaster.java:309)
>>        at org.apache.hadoop.service.AbstractService.init(
>> AbstractService.java:164)
>>        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(
>> MRAppMaster.java:1737)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:422)
>>        at org.apache.hadoop.security.UserGroupInformation.doAs(
>> UserGroupInformation.java:1962)
>>        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
>> initAndStartAppMaster(MRAppMaster.java:1734)
>>        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(
>> MRAppMaster.java:1668)
>> 
>> 
>> Now, in trying to increase the heap size, I added this to the beginning of
>> the script:
>> 
>> 
>> SET mapreduce.map.java.opts '-Xmx2048m';
>> SET mapreduce.reduce.java.opts '-Xmx2048m';
>> SET mapreduce.map.memory.mb 2536;
>> SET mapreduce.reduce.memory.mb 2536;
>> 
>> But this causes no effect, as it is being ignored.  From the Yarn logs, I
>> see the Container being launched with 1024m heap size:
>> 
>> echo "Launching container"
>> exec /bin/bash -c "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp
>> -Dlog4j.configuration=container-log4j.properties
>> -Dyarn.app.container.log.dir=/opt/hadoop/lo
>> gs/userlogs/application_1523452171521_0223/container_1523452171521_0223_01_000001
>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
>> -Dhadoop.
>> root.logfile=syslog  -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster
>> 1>/opt/hadoop/logs/userlogs/application_1523452171521_
>> 0223/container_1523452171
>> 521_0223_01_000001/stdout 2>/opt/hadoop/logs/userlogs/
>> application_1523452171521_0223/container_1523452171521_0223_01_000001/stderr
>> “
>> 
>> I also tried setting the memory requirements with the PIG_OPTS environment
>> variable:
>> 
>> export PIG_OPTS="-Dmapreduce.reduce.memory.mb=5000
>> -Dmapreduce.map.memory.mb=5000 -Dmapreduce.map.java.opts=-Xmx5000m”
>> 
>> No matter what I do, the container is always launched with -Xmx1024m and
>> the same OOM error occurs.
>> The question is, what is the proper way to specify the heap sizes for my
>> Pig mappers and reducers?
>> 
>> Best regards,
>> Alex soto
>> 
>> 
>>

Re: Out of memory running on Yarn

Posted by Koji Noguchi <kn...@oath.com.INVALID>.

Hi Alex,

Can you try increasing the heapsize of the ApplicationMaster?

yarn.app.mapreduce.am.resource.mb=3584
yarn.app.mapreduce.am.command-opts=-Xmx3096m

Koji



On Fri, Apr 27, 2018 at 1:49 PM, Alex Soto <al...@envieta.com> wrote:

> Hello,
>
> I am using Pig version 0.17.0.  When I attempt to run my pig script from
> the command line on a Yarn cluster I get out of memory errors.  From the
> Yarn application logs, I see this stack trace:
>
> 2018-04-27 13:22:10,543 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
> Error starting MRAppMaster
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOfRange(Arrays.java:3664)
>         at java.lang.String.<init>(String.java:207)
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>         at org.apache.hadoop.conf.Configuration.loadResource(
> Configuration.java:2992)
>         at org.apache.hadoop.conf.Configuration.loadResources(
> Configuration.java:2817)
>         at org.apache.hadoop.conf.Configuration.getProps(
> Configuration.java:2689)
>         at org.apache.hadoop.conf.Configuration.set(
> Configuration.java:1326)
>         at org.apache.hadoop.conf.Configuration.set(
> Configuration.java:1298)
>         at org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.
> mergeConf(ConfigurationUtil.java:70)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> PigOutputFormat.setLocation(PigOutputFormat.java:185)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> PigOutputCommitter.setUpContext(PigOutputCommitter.java:115)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> PigOutputCommitter.getCommitters(PigOutputCommitter.java:89)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> PigOutputCommitter.<init>(PigOutputCommitter.java:70)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> PigOutputFormat.getOutputCommitter(PigOutputFormat.java:297)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(
> MRAppMaster.java:550)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(
> MRAppMaster.java:532)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
> callWithJobClassLoader(MRAppMaster.java:1779)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
> createOutputCommitter(MRAppMaster.java:532)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
> serviceInit(MRAppMaster.java:309)
>         at org.apache.hadoop.service.AbstractService.init(
> AbstractService.java:164)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(
> MRAppMaster.java:1737)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1962)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
> initAndStartAppMaster(MRAppMaster.java:1734)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(
> MRAppMaster.java:1668)
>
>
> Now, in trying to increase the heap size, I added this to the beginning of
> the script:
>
>
> SET mapreduce.map.java.opts '-Xmx2048m';
> SET mapreduce.reduce.java.opts '-Xmx2048m';
> SET mapreduce.map.memory.mb 2536;
> SET mapreduce.reduce.memory.mb 2536;
>
> But this causes no effect, as it is being ignored.  From the Yarn logs, I
> see the Container being launched with 1024m heap size:
>
> echo "Launching container"
> exec /bin/bash -c "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp
> -Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.container.log.dir=/opt/hadoop/lo
> gs/userlogs/application_1523452171521_0223/container_1523452171521_0223_01_000001
> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
> -Dhadoop.
> root.logfile=syslog  -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> 1>/opt/hadoop/logs/userlogs/application_1523452171521_
> 0223/container_1523452171
> 521_0223_01_000001/stdout 2>/opt/hadoop/logs/userlogs/
> application_1523452171521_0223/container_1523452171521_0223_01_000001/stderr
> “
>
> I also tried setting the memory requirements with the PIG_OPTS environment
> variable:
>
> export PIG_OPTS="-Dmapreduce.reduce.memory.mb=5000
> -Dmapreduce.map.memory.mb=5000 -Dmapreduce.map.java.opts=-Xmx5000m”
>
> No matter what I do, the container is always launched with -Xmx1024m and
> the same OOM error occurs.
> The question is, what is the proper way to specify the heap sizes for my
> Pig mappers and reducers?
>
> Best regards,
> Alex soto
>
>
>