You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Jordi Blasi Uribarri <jb...@nextel.es> on 2015/09/22 10:06:10 UTC

container is running beyond virtual memory limits

Hi,

I am not really sure If this is related to any of the previous questions so I am asking it in a new message. I am running three different samza jobs that perform different actions and interchange information. As I found limits in the memory that were preventing the jobs to get from Accepted to Running I introduced some configurations in Yarn, as suggested in this list:


yarn-site.xml

<configuration>
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>128</value>
    <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>512</value>
    <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value>
    <description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>2</value>
    <description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
  </property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>kfk-samza01</value>
</property>
</configuration>

capacity-scheduler.xml
Alter value
    <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.5</value>
    <description>
      Maximum percent of resources in the cluster which can be used to run
      application masters i.e. controls number of concurrent running
      applications.
    </description>
  </property>

The jobs are configured to reduce the memory usage:

yarn.container.memory.mb=256
yarn.am.container.memory.mb=256

After introducing these changes I experienced a very appreciable reduction of the speed. It seemed normal as the memory assigned to the jobs  was lowered and there were more running.  It was running until yesterday but today I am seeing that

What I have seen today is that they are not moving from ACCEPTED to RUNNING. I have found the following in the log (full log at the end):

2015-09-22 09:54:36,661 INFO  [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 10346 for container-id container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical memory used; 1.2 GB of 537.6 MB virtual memory used

I am not sure where that 1.2 Gb comes from and makes the processes dye.

Thanks,

   Jordi




2015-09-22 09:54:36,519 INFO  [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - Removed ProcessTree with root 10271
2015-09-22 09:54:36,519 INFO  [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(999)) - Container container_1442908447829_0002_01_000001 transitioned from RUNNING to KILLING
2015-09-22 09:54:36,533 INFO  [AsyncDispatcher event handler] launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) - Cleaning up container container_1442908447829_0002_01_000001
2015-09-22 09:54:36,661 INFO  [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 10346 for container-id container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical memory used; 1.2 GB of 537.6 MB virtual memory used
2015-09-22 09:54:36,661 WARN  [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process tree for container: container_1442908447829_0001_01_000001 running over twice the configured limit. Limit=563714432, current usage = 1269743616
2015-09-22 09:54:36,662 WARN  [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447)) - Container [pid=10346,containerID=container_1442908447829_0001_01_000001] is running beyond virtual memory limits. Current usage: 70.0 MB of 256 MB physical memory used; 1.2 GB of 537.6 MB virtual memory used. Killing container.
Dump of the process-tree for container_1442908447829_0001_01_000001 :
        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
        |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -Dsamza.container.name=samza-application-master -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001/container_1442908447829_0001_01_000001 -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/tmp -Xmx768M -XX:+PrintGCDateStamps -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001/container_1442908447829_0001_01_000001/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10241024 -d64 -cp /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-jaxrs-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBroker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar org.apache.samza.job.yarn.SamzaAppMaster

2015-09-22 09:54:36,663 INFO  [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - Removed ProcessTree with root 10346
2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(999)) - Container container_1442908447829_0001_01_000001 transitioned from RUNNING to KILLING
2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler] launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) - Cleaning up container container_1442908447829_0001_01_000001
________________________________
Jordi Blasi Uribarri
Área I+D+i

jblasi@nextel.es
Oficina Bilbao

[http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]

RE: container is running beyond virtual memory limits

Posted by Jordi Blasi Uribarri <jb...@nextel.es>.
Just to give an as complete view of my situation I am compiling what I have done and what my problem is, so maybe you have the most complete information.

What I have done is the following in two virtual machines, with 4 cores and 4gb ram each.
Install Debian 7.8. Plain with no graphical interface.
	apt-get install openjdk-7-jdk openjdk-7-jre git maven curl

	git clone http://git-wip-us.apache.org/repos/asf/samza.git
	gradlew clean build

As there was a bug in the Keyrocks testing script I just commented the code in the TestTTL script.

	wget http://apache.rediris.es/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
	tar -xvf hadoop-2.6.0.tar.gz

	vi conf/yarn-site.xml
		<configuration>
		<property>
		 <name>yarn.resourcemanager.hostname</name>
		 <value>kfk-samza01</value>
		</property>
		<property>
		<name>yarn.nodemanager.resource.memory-mb</name>
		<value>2048</value>
		</property>
		<property>
		<name>yarn.scheduler.minimum-allocation-mb</name>
		<value>128</value>
		</property>
		<property>
		<name>yarn.nodemanager.resource.cpu-vcores</name>
		<value>3</value>
		</property>
		</configuration>

	cp ./etc/hadoop/capacity-scheduler.xml conf

	vi $HADOOP_YARN_HOME/conf/core-site.xml
		<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
		<configuration>
		    <property>
		      <name>fs.http.impl</name>
		      <value>org.apache.samza.util.hadoop.HttpFileSystem</value>
		    </property>
		</configuration>
		
	curl http://www.scala-lang.org/files/archive/scala-2.10.4.tgz > scala-2.10.4.tgz
	tar -xvf scala-2.10.4.tgz	
	cp /tmp/scala-2.10.4/lib/scala-compiler.jar $HADOOP_YARN_HOME/share/hadoop/hdfs/lib
	cp /tmp/scala-2.10.4/lib/scala-library.jar $HADOOP_YARN_HOME/share/hadoop/hdfs/lib
	curl -L http://search.maven.org/remotecontent?filepath=org/clapper/grizzled-slf4j_2.10/1.0.1/grizzled-	slf4j_2.10-1.0.1.jar > $HADOOP_YARN_HOME/share/hadoop/hdfs/lib/grizzled-slf4j_2.10-1.0.1.jar
	curl -L http://search.maven.org/remotecontent?filepath=org/apache/samza/samza-yarn_2.10/0.9.1/samza-	yarn_2.10-0.9.1.jar > $HADOOP_YARN_HOME/share/hadoop/hdfs/lib/samza-yarn_2.10-0.9.1.jar
	curl -L http://search.maven.org/remotecontent?filepath=org/apache/samza/samza-core_2.10/0.9.1/samza-		core_2.10-0.9.1.jar > $HADOOP_YARN_HOME/share/hadoop/hdfs/lib/samza-core_2.10-0.9.1.jar

	cd /opt/hadoop-2.6.0/
	scp -r . 192.168.15.94:/opt/hadoop-2.6.0
	echo 192.168.15.92 >> conf/slaves
	echo 192.168.15.94 >>  conf/slaves
	sbin/start-yarn.sh

I have copied in the /opt/jobs/bin all the scrips in the /opt/samza/samza-shell/src/main/bash/ folder.

I have generated an eclipse project with the samza dependencies included, via Maven, and no jobs, package it and copy to /opt/jobs/lib.

I have generated an eclipse project with the samza dependencies included, via Maven, and three jobs that implement StreamTask and initiableTask. The functions are empty, for testing purposes. It is published in a folder published through apache web server.

I have created the associated job options file in the /opt/job/dtan folder like this:

task.class=flow.WorkFlow
job.name=flow.WorkFlow
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
yarn.package.path=http://192.168.15.92/jobs/DataAnalyzer-0.0.1-bin.tar.gz

systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.consumer.zookeeper.connect=kfk-kafka01:2181,kfk-kafka02:2181
systems.kafka.producer.bootstrap.servers=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:9093
systems.kafka.producer.metadata.broker.list=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:909

task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
task.checkpoint.system=kafka

task.inputs=kafka.flowtpc

serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory

systems.kafka.samza.msg.serde=string
systems.kafka.streams.tracetpc.samza.msg.serde=json

yarn.container.memory.mb=256
yarn.am.container.memory.mb=256

task.opts= -Xms128M -Xmx128M
task.commit.ms=100

What I see:
	•	If I launch the three jobs, Only one of them gets to running state. The one called Router. I it is always the same one. The others stay in Accepted until they are killed by the system. I have seen these error:
		o	Container [pid=23007,containerID=container_1443454508386_0003_01_000001] is running beyond virtual memory limits. Current usage: 13.9 MB of 256 MB physical memory used; 1.1 GB of 537.6 MB virtual memory used. Killing container
	•	When I kill the jobs with the kill-yarn-job.sh script the java process does not get killed. 
	•	Although I have set in the options  that the job should be launched with -Xms128M -Xmx128M I see that it runs with -Xmx768M. I have even changed the run-class.sh script but it does not change.

Some things that I am describing do not make sense for me, so I am lost on what to do or where to look.

Thanks for your help,

	Jordi



-----Mensaje original-----
De: Jordi Blasi Uribarri [mailto:jblasi@nextel.es] 
Enviado el: lunes, 28 de septiembre de 2015 11:26
Para: dev@samza.apache.org
Asunto: RE: container is running beyond virtual memory limits

I just changed the task options file to add the following line:

task.opts=-Xmx128M

And I found no change on the behaivour. I see that the job is being launched with the default -Xmx768M value:

root      8296  8294  1 11:16 ?        00:00:05 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -Dsamza.container.name=samza-application-master -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1443431699703_0003/container_1443431699703_0003_01_000001 -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/tmp -Xmx768M -XX:+PrintGCDateStamps -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1443431699703_0003/container_1443431699703_0003_01_000001/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10241024 -d64 -cp /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/DataAnalyzer-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/DataAnalyzer-0.0.1-jar-with-dependencies.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-core-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-jaxrs-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar org.apache.samza.job.yarn.SamzaAppMaster

How do I set the correct value?

Thanks,

   Jordi

-----Mensaje original-----
De: Yi Pan [mailto:nickpan47@gmail.com] Enviado el: lunes, 28 de septiembre de 2015 10:56
Para: dev@samza.apache.org
Asunto: Re: container is running beyond virtual memory limits

Hi, Jordi,

Please find the config variable task.opts in this table:
http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html

This allows you to add additional JVM opts when launching the containers.

-Yi

On Mon, Sep 28, 2015 at 1:48 AM, Jordi Blasi Uribarri <jb...@nextel.es>
wrote:

> The three tasks have a similar options file, like this one.
>
> task.class=flow.OperationJob
> job.name=flow.OperationJob
> job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
> yarn.package.path=http://IP/javaapp.tar.gz
>
>
> systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemF
> actory
> systems.kafka.consumer.zookeeper.connect=kfk-kafka01:2181,kfk-kafka02:
> 2181
>
> systems.kafka.producer.bootstrap.servers=kfk-kafka01:9092,kfk-kafka01:
> 9093,kfk-kafka02:9092,kfk-kafka02:9093
>
> systems.kafka.producer.metadata.broker.list=kfk-kafka01:9092,kfk-kafka
> 01:9093,kfk-kafka02:9092,kfk-kafka02:909
>
>
> task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpo
> intManagerFactory
> task.checkpoint.system=kafka
> task.inputs=kafka.operationtpc
>
>
> serializers.registry.json.class=org.apache.samza.serializers.JsonSerde
> Factory
>
> serializers.registry.string.class=org.apache.samza.serializers.StringS
> erdeFactory
>
> systems.kafka.samza.msg.serde=string
> systems.kafka.streams.tracetpc.samza.msg.serde=json
>
> yarn.container.memory.mb=256
> yarn.am.container.memory.mb=256
>
> task.commit.ms=1000
> task.window.ms=60000
>
> Where do I have to change the XMX parameter?
>
> Thanks.
>
>      Jordi
>
>
> -----Mensaje original-----
> De: Yi Pan [mailto:nickpan47@gmail.com] Enviado el: lunes, 28 de 
> septiembre de 2015 10:39
> Para: dev@samza.apache.org
> Asunto: Re: container is running beyond virtual memory limits
>
> Hi, Jordi,
>
> Can you post your task.opts settings as well? The Xms and Xmx JVM opts 
> will play a role here as well. The Xmx size should be set to less than 
> yarn.container.memory.mb.
>
> -Yi
>
> On Tue, Sep 22, 2015 at 4:32 AM, Jordi Blasi Uribarri 
> <jb...@nextel.es>
> wrote:
>
> > I am seeing that I can not get even a single job running. I have 
> > recovered the original configuration of yarn-site.xml and 
> > capacity-scheduler.xml and that does not work. I am thinking that 
> > maybe there is some kind of information related to old jobs that 
> > have not been correctly cleaned when killing them. Is there any 
> > place where I can look to remove temporary files or something similar?
> >
> > Thanks
> >
> >         jordi
> >
> > -----Mensaje original-----
> > De: Jordi Blasi Uribarri [mailto:jblasi@nextel.es] Enviado el: 
> > martes,
> > 22 de septiembre de 2015 10:06
> > Para: dev@samza.apache.org
> > Asunto: container is running beyond virtual memory limits
> >
> > Hi,
> >
> > I am not really sure If this is related to any of the previous 
> > questions so I am asking it in a new message. I am running three 
> > different samza jobs that perform different actions and interchange 
> > information. As I found limits in the memory that were preventing 
> > the jobs to get from Accepted to Running I introduced some 
> > configurations in
> Yarn, as suggested in this list:
> >
> >
> > yarn-site.xml
> >
> > <configuration>
> >   <property>
> >     <name>yarn.scheduler.minimum-allocation-mb</name>
> >     <value>128</value>
> >     <description>Minimum limit of memory to allocate to each 
> > container request at the Resource Manager.</description>
> >   </property>
> >   <property>
> >     <name>yarn.scheduler.maximum-allocation-mb</name>
> >     <value>512</value>
> >     <description>Maximum limit of memory to allocate to each 
> > container request at the Resource Manager.</description>
> >   </property>
> >   <property>
> >     <name>yarn.scheduler.minimum-allocation-vcores</name>
> >     <value>1</value>
> >     <description>The minimum allocation for every container request 
> > at the RM, in terms of virtual CPU cores. Requests lower than this 
> > won't take effect, and the specified value will get allocated the 
> > minimum.</description>
> >   </property>
> >   <property>
> >     <name>yarn.scheduler.maximum-allocation-vcores</name>
> >     <value>2</value>
> >     <description>The maximum allocation for every container request 
> > at the RM, in terms of virtual CPU cores. Requests higher than this 
> > won't take effect, and will get capped to this value.</description>
> >   </property>
> > <property>
> > <name>yarn.resourcemanager.hostname</name>
> > <value>kfk-samza01</value>
> > </property>
> > </configuration>
> >
> > capacity-scheduler.xml
> > Alter value
> >     <property>
> >     <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
> >     <value>0.5</value>
> >     <description>
> >       Maximum percent of resources in the cluster which can be used 
> > to
> run
> >       application masters i.e. controls number of concurrent running
> >       applications.
> >     </description>
> >   </property>
> >
> > The jobs are configured to reduce the memory usage:
> >
> > yarn.container.memory.mb=256
> > yarn.am.container.memory.mb=256
> >
> > After introducing these changes I experienced a very appreciable 
> > reduction of the speed. It seemed normal as the memory assigned to 
> > the jobs  was lowered and there were more running.  It was running 
> > until yesterday but today I am seeing that
> >
> > What I have seen today is that they are not moving from ACCEPTED to 
> > RUNNING. I have found the following in the log (full log at the end):
> >
> > 2015-09-22 09:54:36,661 INFO  [Container Monitor] 
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408))
> > - Memory usage of ProcessTree 10346 for container-id
> > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical 
> > memory used; 1.2 GB of 537.6 MB virtual memory used
> >
> > I am not sure where that 1.2 Gb comes from and makes the processes dye.
> >
> > Thanks,
> >
> >    Jordi
> >
> >
> >
> >
> > 2015-09-22 09:54:36,519 INFO  [Container Monitor] 
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458))
> > - Removed ProcessTree with root 10271
> > 2015-09-22 09:54:36,519 INFO  [AsyncDispatcher event handler] 
> > container.Container (ContainerImpl.java:handle(999)) - Container
> > container_1442908447829_0002_01_000001 transitioned from RUNNING to 
> > KILLING
> > 2015-09-22 09:54:36,533 INFO  [AsyncDispatcher event handler] 
> > launcher.ContainerLaunch
> > (ContainerLaunch.java:cleanupContainer(370))
> > - Cleaning up container container_1442908447829_0002_01_000001
> > 2015-09-22 09:54:36,661 INFO  [Container Monitor] 
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408))
> > - Memory usage of ProcessTree 10346 for container-id
> > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical 
> > memory used; 1.2 GB of 537.6 MB virtual memory used
> > 2015-09-22 09:54:36,661 WARN  [Container Monitor] 
> > monitor.ContainersMonitorImpl
> > (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process 
> > tree for
> > container: container_1442908447829_0001_01_000001 running over twice 
> > the configured limit. Limit=563714432, current usage = 1269743616
> > 2015-09-22 09:54:36,662 WARN  [Container Monitor] 
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447))
> > - Container
> > [pid=10346,containerID=container_1442908447829_0001_01_000001] is 
> > running beyond virtual memory limits. Current usage: 70.0 MB of 256 
> > MB physical memory used; 1.2 GB of 537.6 MB virtual memory used.
> > Killing
> container.
> > Dump of the process-tree for container_1442908447829_0001_01_000001 :
> >         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> >         |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908 
> > /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server 
> > -Dsamza.container.name=samza-application-master
> > -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_14429084
> > 47
> > 829_0001/container_1442908447829_0001_01_000001
> > -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcac
> > he
> > /application_1442908447829_0001/container_1442908447829_0001_01_0000
> > 01
> > /__package/tmp
> > -Xmx768M -XX:+PrintGCDateStamps
> > -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_00
> > 01 /container_1442908447829_0001_01_000001/gc.log
> > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> > -XX:GCLogFileSize=10241024 -d64 -cp
> > /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/
> > ap
> > pcache/application_1442908447829_0001/container_1442908447829_0001_0
> > 1_
> > 000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/
> > nm
> > -local-dir/usercache/root/appcache/application_1442908447829_0001/co
> > nt
> > ainer_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.
> > ja
> > r:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_
> > 14
> > 42908447829_0001/container_1442908447829_0001_01_000001/__package/li
> > b/
> > jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/r
> > oo
> > t/appcache/application_1442908447829_0001/container_1442908447829_00
> > 01
> > _01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/had
> > oo
> > p-root/nm-local-dir/usercache/root/appcache/application_144290844782
> > 9_
> > 0001/container_1442908447829_0001_01_000001/__package/lib/jackson-ja
> > xr
> > s-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/ro
> > ot
> > /appcache/application_1442908447829_0001/container_1442908447829_000
> > 1_
> > 01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/t
> > mp
> > /hadoop-root/nm-local-dir/usercache/root/appcache/application_144290
> > 84
> > 47829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtB
> > ro
> > ker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/
> > ap
> > plication_1442908447829_0001/container_1442908447829_0001_01_000001/
> > __ package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar
> > org.apache.samza.job.yarn.SamzaAppMaster
> >
> > 2015-09-22 09:54:36,663 INFO  [Container Monitor] 
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458))
> > - Removed ProcessTree with root 10346
> > 2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler] 
> > container.Container (ContainerImpl.java:handle(999)) - Container
> > container_1442908447829_0001_01_000001 transitioned from RUNNING to 
> > KILLING
> > 2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler] 
> > launcher.ContainerLaunch
> > (ContainerLaunch.java:cleanupContainer(370))
> > - Cleaning up container container_1442908447829_0001_01_000001
> > ________________________________
> > Jordi Blasi Uribarri
> > Área I+D+i
> >
> > jblasi@nextel.es
> > Oficina Bilbao
> >
> > [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
> > ________________________________
> > Jordi Blasi Uribarri
> >
>

RE: container is running beyond virtual memory limits

Posted by Jordi Blasi Uribarri <jb...@nextel.es>.
I just changed the task options file to add the following line:

task.opts=-Xmx128M

And I found no change on the behaivour. I see that the job is being launched with the default -Xmx768M value:

root      8296  8294  1 11:16 ?        00:00:05 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -Dsamza.container.name=samza-application-master -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1443431699703_0003/container_1443431699703_0003_01_000001 -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/tmp -Xmx768M -XX:+PrintGCDateStamps -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1443431699703_0003/container_1443431699703_0003_01_000001/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10241024 -d64 -cp /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/DataAnalyzer-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/DataAnalyzer-0.0.1-jar-with-dependencies.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-core-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-jaxrs-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar org.apache.samza.job.yarn.SamzaAppMaster

How do I set the correct value?

Thanks,

   Jordi

-----Mensaje original-----
De: Yi Pan [mailto:nickpan47@gmail.com] 
Enviado el: lunes, 28 de septiembre de 2015 10:56
Para: dev@samza.apache.org
Asunto: Re: container is running beyond virtual memory limits

Hi, Jordi,

Please find the config variable task.opts in this table:
http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html

This allows you to add additional JVM opts when launching the containers.

-Yi

On Mon, Sep 28, 2015 at 1:48 AM, Jordi Blasi Uribarri <jb...@nextel.es>
wrote:

> The three tasks have a similar options file, like this one.
>
> task.class=flow.OperationJob
> job.name=flow.OperationJob
> job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
> yarn.package.path=http://IP/javaapp.tar.gz
>
>
> systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemF
> actory
> systems.kafka.consumer.zookeeper.connect=kfk-kafka01:2181,kfk-kafka02:
> 2181
>
> systems.kafka.producer.bootstrap.servers=kfk-kafka01:9092,kfk-kafka01:
> 9093,kfk-kafka02:9092,kfk-kafka02:9093
>
> systems.kafka.producer.metadata.broker.list=kfk-kafka01:9092,kfk-kafka
> 01:9093,kfk-kafka02:9092,kfk-kafka02:909
>
>
> task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpo
> intManagerFactory
> task.checkpoint.system=kafka
> task.inputs=kafka.operationtpc
>
>
> serializers.registry.json.class=org.apache.samza.serializers.JsonSerde
> Factory
>
> serializers.registry.string.class=org.apache.samza.serializers.StringS
> erdeFactory
>
> systems.kafka.samza.msg.serde=string
> systems.kafka.streams.tracetpc.samza.msg.serde=json
>
> yarn.container.memory.mb=256
> yarn.am.container.memory.mb=256
>
> task.commit.ms=1000
> task.window.ms=60000
>
> Where do I have to change the XMX parameter?
>
> Thanks.
>
>      Jordi
>
>
> -----Mensaje original-----
> De: Yi Pan [mailto:nickpan47@gmail.com] Enviado el: lunes, 28 de 
> septiembre de 2015 10:39
> Para: dev@samza.apache.org
> Asunto: Re: container is running beyond virtual memory limits
>
> Hi, Jordi,
>
> Can you post your task.opts settings as well? The Xms and Xmx JVM opts 
> will play a role here as well. The Xmx size should be set to less than 
> yarn.container.memory.mb.
>
> -Yi
>
> On Tue, Sep 22, 2015 at 4:32 AM, Jordi Blasi Uribarri 
> <jb...@nextel.es>
> wrote:
>
> > I am seeing that I can not get even a single job running. I have 
> > recovered the original configuration of yarn-site.xml and 
> > capacity-scheduler.xml and that does not work. I am thinking that 
> > maybe there is some kind of information related to old jobs that 
> > have not been correctly cleaned when killing them. Is there any 
> > place where I can look to remove temporary files or something similar?
> >
> > Thanks
> >
> >         jordi
> >
> > -----Mensaje original-----
> > De: Jordi Blasi Uribarri [mailto:jblasi@nextel.es] Enviado el: 
> > martes,
> > 22 de septiembre de 2015 10:06
> > Para: dev@samza.apache.org
> > Asunto: container is running beyond virtual memory limits
> >
> > Hi,
> >
> > I am not really sure If this is related to any of the previous 
> > questions so I am asking it in a new message. I am running three 
> > different samza jobs that perform different actions and interchange 
> > information. As I found limits in the memory that were preventing 
> > the jobs to get from Accepted to Running I introduced some 
> > configurations in
> Yarn, as suggested in this list:
> >
> >
> > yarn-site.xml
> >
> > <configuration>
> >   <property>
> >     <name>yarn.scheduler.minimum-allocation-mb</name>
> >     <value>128</value>
> >     <description>Minimum limit of memory to allocate to each 
> > container request at the Resource Manager.</description>
> >   </property>
> >   <property>
> >     <name>yarn.scheduler.maximum-allocation-mb</name>
> >     <value>512</value>
> >     <description>Maximum limit of memory to allocate to each 
> > container request at the Resource Manager.</description>
> >   </property>
> >   <property>
> >     <name>yarn.scheduler.minimum-allocation-vcores</name>
> >     <value>1</value>
> >     <description>The minimum allocation for every container request 
> > at the RM, in terms of virtual CPU cores. Requests lower than this 
> > won't take effect, and the specified value will get allocated the 
> > minimum.</description>
> >   </property>
> >   <property>
> >     <name>yarn.scheduler.maximum-allocation-vcores</name>
> >     <value>2</value>
> >     <description>The maximum allocation for every container request 
> > at the RM, in terms of virtual CPU cores. Requests higher than this 
> > won't take effect, and will get capped to this value.</description>
> >   </property>
> > <property>
> > <name>yarn.resourcemanager.hostname</name>
> > <value>kfk-samza01</value>
> > </property>
> > </configuration>
> >
> > capacity-scheduler.xml
> > Alter value
> >     <property>
> >     <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
> >     <value>0.5</value>
> >     <description>
> >       Maximum percent of resources in the cluster which can be used 
> > to
> run
> >       application masters i.e. controls number of concurrent running
> >       applications.
> >     </description>
> >   </property>
> >
> > The jobs are configured to reduce the memory usage:
> >
> > yarn.container.memory.mb=256
> > yarn.am.container.memory.mb=256
> >
> > After introducing these changes I experienced a very appreciable 
> > reduction of the speed. It seemed normal as the memory assigned to 
> > the jobs  was lowered and there were more running.  It was running 
> > until yesterday but today I am seeing that
> >
> > What I have seen today is that they are not moving from ACCEPTED to 
> > RUNNING. I have found the following in the log (full log at the end):
> >
> > 2015-09-22 09:54:36,661 INFO  [Container Monitor] 
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) 
> > - Memory usage of ProcessTree 10346 for container-id
> > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical 
> > memory used; 1.2 GB of 537.6 MB virtual memory used
> >
> > I am not sure where that 1.2 Gb comes from and makes the processes dye.
> >
> > Thanks,
> >
> >    Jordi
> >
> >
> >
> >
> > 2015-09-22 09:54:36,519 INFO  [Container Monitor] 
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) 
> > - Removed ProcessTree with root 10271
> > 2015-09-22 09:54:36,519 INFO  [AsyncDispatcher event handler] 
> > container.Container (ContainerImpl.java:handle(999)) - Container
> > container_1442908447829_0002_01_000001 transitioned from RUNNING to 
> > KILLING
> > 2015-09-22 09:54:36,533 INFO  [AsyncDispatcher event handler] 
> > launcher.ContainerLaunch 
> > (ContainerLaunch.java:cleanupContainer(370))
> > - Cleaning up container container_1442908447829_0002_01_000001
> > 2015-09-22 09:54:36,661 INFO  [Container Monitor] 
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) 
> > - Memory usage of ProcessTree 10346 for container-id
> > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical 
> > memory used; 1.2 GB of 537.6 MB virtual memory used
> > 2015-09-22 09:54:36,661 WARN  [Container Monitor] 
> > monitor.ContainersMonitorImpl
> > (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process 
> > tree for
> > container: container_1442908447829_0001_01_000001 running over twice 
> > the configured limit. Limit=563714432, current usage = 1269743616
> > 2015-09-22 09:54:36,662 WARN  [Container Monitor] 
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447)) 
> > - Container 
> > [pid=10346,containerID=container_1442908447829_0001_01_000001] is 
> > running beyond virtual memory limits. Current usage: 70.0 MB of 256 
> > MB physical memory used; 1.2 GB of 537.6 MB virtual memory used. 
> > Killing
> container.
> > Dump of the process-tree for container_1442908447829_0001_01_000001 :
> >         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> >         |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908 
> > /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server 
> > -Dsamza.container.name=samza-application-master
> > -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_14429084
> > 47
> > 829_0001/container_1442908447829_0001_01_000001
> > -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcac
> > he
> > /application_1442908447829_0001/container_1442908447829_0001_01_0000
> > 01
> > /__package/tmp
> > -Xmx768M -XX:+PrintGCDateStamps
> > -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_00
> > 01 /container_1442908447829_0001_01_000001/gc.log
> > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> > -XX:GCLogFileSize=10241024 -d64 -cp
> > /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/
> > ap 
> > pcache/application_1442908447829_0001/container_1442908447829_0001_0
> > 1_ 
> > 000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/
> > nm 
> > -local-dir/usercache/root/appcache/application_1442908447829_0001/co
> > nt 
> > ainer_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.
> > ja
> > r:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_
> > 14 
> > 42908447829_0001/container_1442908447829_0001_01_000001/__package/li
> > b/ 
> > jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/r
> > oo
> > t/appcache/application_1442908447829_0001/container_1442908447829_00
> > 01 
> > _01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/had
> > oo 
> > p-root/nm-local-dir/usercache/root/appcache/application_144290844782
> > 9_ 
> > 0001/container_1442908447829_0001_01_000001/__package/lib/jackson-ja
> > xr 
> > s-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/ro
> > ot 
> > /appcache/application_1442908447829_0001/container_1442908447829_000
> > 1_ 
> > 01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/t
> > mp
> > /hadoop-root/nm-local-dir/usercache/root/appcache/application_144290
> > 84 
> > 47829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtB
> > ro 
> > ker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/
> > ap 
> > plication_1442908447829_0001/container_1442908447829_0001_01_000001/
> > __ package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar
> > org.apache.samza.job.yarn.SamzaAppMaster
> >
> > 2015-09-22 09:54:36,663 INFO  [Container Monitor] 
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) 
> > - Removed ProcessTree with root 10346
> > 2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler] 
> > container.Container (ContainerImpl.java:handle(999)) - Container
> > container_1442908447829_0001_01_000001 transitioned from RUNNING to 
> > KILLING
> > 2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler] 
> > launcher.ContainerLaunch 
> > (ContainerLaunch.java:cleanupContainer(370))
> > - Cleaning up container container_1442908447829_0001_01_000001
> > ________________________________
> > Jordi Blasi Uribarri
> > Área I+D+i
> >
> > jblasi@nextel.es
> > Oficina Bilbao
> >
> > [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
> > ________________________________
> > Jordi Blasi Uribarri
> >
>

Re: container is running beyond virtual memory limits

Posted by Yi Pan <ni...@gmail.com>.
Hi, Jordi,

Please find the config variable task.opts in this table:
http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html

This allows you to add additional JVM opts when launching the containers.

-Yi

On Mon, Sep 28, 2015 at 1:48 AM, Jordi Blasi Uribarri <jb...@nextel.es>
wrote:

> The three tasks have a similar options file, like this one.
>
> task.class=flow.OperationJob
> job.name=flow.OperationJob
> job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
> yarn.package.path=http://IP/javaapp.tar.gz
>
>
> systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
> systems.kafka.consumer.zookeeper.connect=kfk-kafka01:2181,kfk-kafka02:2181
>
> systems.kafka.producer.bootstrap.servers=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:9093
>
> systems.kafka.producer.metadata.broker.list=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:909
>
>
> task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
> task.checkpoint.system=kafka
> task.inputs=kafka.operationtpc
>
>
> serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
>
> serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory
>
> systems.kafka.samza.msg.serde=string
> systems.kafka.streams.tracetpc.samza.msg.serde=json
>
> yarn.container.memory.mb=256
> yarn.am.container.memory.mb=256
>
> task.commit.ms=1000
> task.window.ms=60000
>
> Where do I have to change the XMX parameter?
>
> Thanks.
>
>      Jordi
>
>
> -----Mensaje original-----
> De: Yi Pan [mailto:nickpan47@gmail.com]
> Enviado el: lunes, 28 de septiembre de 2015 10:39
> Para: dev@samza.apache.org
> Asunto: Re: container is running beyond virtual memory limits
>
> Hi, Jordi,
>
> Can you post your task.opts settings as well? The Xms and Xmx JVM opts
> will play a role here as well. The Xmx size should be set to less than
> yarn.container.memory.mb.
>
> -Yi
>
> On Tue, Sep 22, 2015 at 4:32 AM, Jordi Blasi Uribarri <jb...@nextel.es>
> wrote:
>
> > I am seeing that I can not get even a single job running. I have
> > recovered the original configuration of yarn-site.xml and
> > capacity-scheduler.xml and that does not work. I am thinking that
> > maybe there is some kind of information related to old jobs that have
> > not been correctly cleaned when killing them. Is there any place where
> > I can look to remove temporary files or something similar?
> >
> > Thanks
> >
> >         jordi
> >
> > -----Mensaje original-----
> > De: Jordi Blasi Uribarri [mailto:jblasi@nextel.es] Enviado el: martes,
> > 22 de septiembre de 2015 10:06
> > Para: dev@samza.apache.org
> > Asunto: container is running beyond virtual memory limits
> >
> > Hi,
> >
> > I am not really sure If this is related to any of the previous
> > questions so I am asking it in a new message. I am running three
> > different samza jobs that perform different actions and interchange
> > information. As I found limits in the memory that were preventing the
> > jobs to get from Accepted to Running I introduced some configurations in
> Yarn, as suggested in this list:
> >
> >
> > yarn-site.xml
> >
> > <configuration>
> >   <property>
> >     <name>yarn.scheduler.minimum-allocation-mb</name>
> >     <value>128</value>
> >     <description>Minimum limit of memory to allocate to each container
> > request at the Resource Manager.</description>
> >   </property>
> >   <property>
> >     <name>yarn.scheduler.maximum-allocation-mb</name>
> >     <value>512</value>
> >     <description>Maximum limit of memory to allocate to each container
> > request at the Resource Manager.</description>
> >   </property>
> >   <property>
> >     <name>yarn.scheduler.minimum-allocation-vcores</name>
> >     <value>1</value>
> >     <description>The minimum allocation for every container request at
> > the RM, in terms of virtual CPU cores. Requests lower than this won't
> > take effect, and the specified value will get allocated the
> > minimum.</description>
> >   </property>
> >   <property>
> >     <name>yarn.scheduler.maximum-allocation-vcores</name>
> >     <value>2</value>
> >     <description>The maximum allocation for every container request at
> > the RM, in terms of virtual CPU cores. Requests higher than this won't
> > take effect, and will get capped to this value.</description>
> >   </property>
> > <property>
> > <name>yarn.resourcemanager.hostname</name>
> > <value>kfk-samza01</value>
> > </property>
> > </configuration>
> >
> > capacity-scheduler.xml
> > Alter value
> >     <property>
> >     <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
> >     <value>0.5</value>
> >     <description>
> >       Maximum percent of resources in the cluster which can be used to
> run
> >       application masters i.e. controls number of concurrent running
> >       applications.
> >     </description>
> >   </property>
> >
> > The jobs are configured to reduce the memory usage:
> >
> > yarn.container.memory.mb=256
> > yarn.am.container.memory.mb=256
> >
> > After introducing these changes I experienced a very appreciable
> > reduction of the speed. It seemed normal as the memory assigned to the
> > jobs  was lowered and there were more running.  It was running until
> > yesterday but today I am seeing that
> >
> > What I have seen today is that they are not moving from ACCEPTED to
> > RUNNING. I have found the following in the log (full log at the end):
> >
> > 2015-09-22 09:54:36,661 INFO  [Container Monitor]
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
> > Memory usage of ProcessTree 10346 for container-id
> > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical
> > memory used; 1.2 GB of 537.6 MB virtual memory used
> >
> > I am not sure where that 1.2 Gb comes from and makes the processes dye.
> >
> > Thanks,
> >
> >    Jordi
> >
> >
> >
> >
> > 2015-09-22 09:54:36,519 INFO  [Container Monitor]
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) -
> > Removed ProcessTree with root 10271
> > 2015-09-22 09:54:36,519 INFO  [AsyncDispatcher event handler]
> > container.Container (ContainerImpl.java:handle(999)) - Container
> > container_1442908447829_0002_01_000001 transitioned from RUNNING to
> > KILLING
> > 2015-09-22 09:54:36,533 INFO  [AsyncDispatcher event handler]
> > launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370))
> > - Cleaning up container container_1442908447829_0002_01_000001
> > 2015-09-22 09:54:36,661 INFO  [Container Monitor]
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
> > Memory usage of ProcessTree 10346 for container-id
> > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical
> > memory used; 1.2 GB of 537.6 MB virtual memory used
> > 2015-09-22 09:54:36,661 WARN  [Container Monitor]
> > monitor.ContainersMonitorImpl
> > (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process
> > tree for
> > container: container_1442908447829_0001_01_000001 running over twice
> > the configured limit. Limit=563714432, current usage = 1269743616
> > 2015-09-22 09:54:36,662 WARN  [Container Monitor]
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447)) -
> > Container
> > [pid=10346,containerID=container_1442908447829_0001_01_000001] is
> > running beyond virtual memory limits. Current usage: 70.0 MB of 256 MB
> > physical memory used; 1.2 GB of 537.6 MB virtual memory used. Killing
> container.
> > Dump of the process-tree for container_1442908447829_0001_01_000001 :
> >         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> >         |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908
> > /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server
> > -Dsamza.container.name=samza-application-master
> > -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1442908447
> > 829_0001/container_1442908447829_0001_01_000001
> > -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache
> > /application_1442908447829_0001/container_1442908447829_0001_01_000001
> > /__package/tmp
> > -Xmx768M -XX:+PrintGCDateStamps
> > -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001
> > /container_1442908447829_0001_01_000001/gc.log
> > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> > -XX:GCLogFileSize=10241024 -d64 -cp
> > /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/ap
> > pcache/application_1442908447829_0001/container_1442908447829_0001_01_
> > 000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm
> > -local-dir/usercache/root/appcache/application_1442908447829_0001/cont
> > ainer_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.ja
> > r:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_14
> > 42908447829_0001/container_1442908447829_0001_01_000001/__package/lib/
> > jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/roo
> > t/appcache/application_1442908447829_0001/container_1442908447829_0001
> > _01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoo
> > p-root/nm-local-dir/usercache/root/appcache/application_1442908447829_
> > 0001/container_1442908447829_0001_01_000001/__package/lib/jackson-jaxr
> > s-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root
> > /appcache/application_1442908447829_0001/container_1442908447829_0001_
> > 01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/tmp
> > /hadoop-root/nm-local-dir/usercache/root/appcache/application_14429084
> > 47829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBro
> > ker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/ap
> > plication_1442908447829_0001/container_1442908447829_0001_01_000001/__
> > package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar
> > org.apache.samza.job.yarn.SamzaAppMaster
> >
> > 2015-09-22 09:54:36,663 INFO  [Container Monitor]
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) -
> > Removed ProcessTree with root 10346
> > 2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler]
> > container.Container (ContainerImpl.java:handle(999)) - Container
> > container_1442908447829_0001_01_000001 transitioned from RUNNING to
> > KILLING
> > 2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler]
> > launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370))
> > - Cleaning up container container_1442908447829_0001_01_000001
> > ________________________________
> > Jordi Blasi Uribarri
> > Área I+D+i
> >
> > jblasi@nextel.es
> > Oficina Bilbao
> >
> > [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
> > ________________________________
> > Jordi Blasi Uribarri
> >
>

RE: container is running beyond virtual memory limits

Posted by Jordi Blasi Uribarri <jb...@nextel.es>.
The three tasks have a similar options file, like this one.

task.class=flow.OperationJob
job.name=flow.OperationJob
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
yarn.package.path=http://IP/javaapp.tar.gz

systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.consumer.zookeeper.connect=kfk-kafka01:2181,kfk-kafka02:2181
systems.kafka.producer.bootstrap.servers=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:9093
systems.kafka.producer.metadata.broker.list=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:909

task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
task.checkpoint.system=kafka
task.inputs=kafka.operationtpc

serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory

systems.kafka.samza.msg.serde=string
systems.kafka.streams.tracetpc.samza.msg.serde=json

yarn.container.memory.mb=256
yarn.am.container.memory.mb=256

task.commit.ms=1000
task.window.ms=60000

Where do I have to change the XMX parameter?

Thanks.

     Jordi


-----Mensaje original-----
De: Yi Pan [mailto:nickpan47@gmail.com] 
Enviado el: lunes, 28 de septiembre de 2015 10:39
Para: dev@samza.apache.org
Asunto: Re: container is running beyond virtual memory limits

Hi, Jordi,

Can you post your task.opts settings as well? The Xms and Xmx JVM opts will play a role here as well. The Xmx size should be set to less than yarn.container.memory.mb.

-Yi

On Tue, Sep 22, 2015 at 4:32 AM, Jordi Blasi Uribarri <jb...@nextel.es>
wrote:

> I am seeing that I can not get even a single job running. I have 
> recovered the original configuration of yarn-site.xml and 
> capacity-scheduler.xml and that does not work. I am thinking that 
> maybe there is some kind of information related to old jobs that have 
> not been correctly cleaned when killing them. Is there any place where 
> I can look to remove temporary files or something similar?
>
> Thanks
>
>         jordi
>
> -----Mensaje original-----
> De: Jordi Blasi Uribarri [mailto:jblasi@nextel.es] Enviado el: martes, 
> 22 de septiembre de 2015 10:06
> Para: dev@samza.apache.org
> Asunto: container is running beyond virtual memory limits
>
> Hi,
>
> I am not really sure If this is related to any of the previous 
> questions so I am asking it in a new message. I am running three 
> different samza jobs that perform different actions and interchange 
> information. As I found limits in the memory that were preventing the 
> jobs to get from Accepted to Running I introduced some configurations in Yarn, as suggested in this list:
>
>
> yarn-site.xml
>
> <configuration>
>   <property>
>     <name>yarn.scheduler.minimum-allocation-mb</name>
>     <value>128</value>
>     <description>Minimum limit of memory to allocate to each container 
> request at the Resource Manager.</description>
>   </property>
>   <property>
>     <name>yarn.scheduler.maximum-allocation-mb</name>
>     <value>512</value>
>     <description>Maximum limit of memory to allocate to each container 
> request at the Resource Manager.</description>
>   </property>
>   <property>
>     <name>yarn.scheduler.minimum-allocation-vcores</name>
>     <value>1</value>
>     <description>The minimum allocation for every container request at 
> the RM, in terms of virtual CPU cores. Requests lower than this won't 
> take effect, and the specified value will get allocated the 
> minimum.</description>
>   </property>
>   <property>
>     <name>yarn.scheduler.maximum-allocation-vcores</name>
>     <value>2</value>
>     <description>The maximum allocation for every container request at 
> the RM, in terms of virtual CPU cores. Requests higher than this won't 
> take effect, and will get capped to this value.</description>
>   </property>
> <property>
> <name>yarn.resourcemanager.hostname</name>
> <value>kfk-samza01</value>
> </property>
> </configuration>
>
> capacity-scheduler.xml
> Alter value
>     <property>
>     <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
>     <value>0.5</value>
>     <description>
>       Maximum percent of resources in the cluster which can be used to run
>       application masters i.e. controls number of concurrent running
>       applications.
>     </description>
>   </property>
>
> The jobs are configured to reduce the memory usage:
>
> yarn.container.memory.mb=256
> yarn.am.container.memory.mb=256
>
> After introducing these changes I experienced a very appreciable 
> reduction of the speed. It seemed normal as the memory assigned to the 
> jobs  was lowered and there were more running.  It was running until 
> yesterday but today I am seeing that
>
> What I have seen today is that they are not moving from ACCEPTED to 
> RUNNING. I have found the following in the log (full log at the end):
>
> 2015-09-22 09:54:36,661 INFO  [Container Monitor] 
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - 
> Memory usage of ProcessTree 10346 for container-id
> container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical 
> memory used; 1.2 GB of 537.6 MB virtual memory used
>
> I am not sure where that 1.2 Gb comes from and makes the processes dye.
>
> Thanks,
>
>    Jordi
>
>
>
>
> 2015-09-22 09:54:36,519 INFO  [Container Monitor] 
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - 
> Removed ProcessTree with root 10271
> 2015-09-22 09:54:36,519 INFO  [AsyncDispatcher event handler] 
> container.Container (ContainerImpl.java:handle(999)) - Container
> container_1442908447829_0002_01_000001 transitioned from RUNNING to 
> KILLING
> 2015-09-22 09:54:36,533 INFO  [AsyncDispatcher event handler] 
> launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) 
> - Cleaning up container container_1442908447829_0002_01_000001
> 2015-09-22 09:54:36,661 INFO  [Container Monitor] 
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - 
> Memory usage of ProcessTree 10346 for container-id
> container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical 
> memory used; 1.2 GB of 537.6 MB virtual memory used
> 2015-09-22 09:54:36,661 WARN  [Container Monitor] 
> monitor.ContainersMonitorImpl
> (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process 
> tree for
> container: container_1442908447829_0001_01_000001 running over twice 
> the configured limit. Limit=563714432, current usage = 1269743616
> 2015-09-22 09:54:36,662 WARN  [Container Monitor] 
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447)) - 
> Container 
> [pid=10346,containerID=container_1442908447829_0001_01_000001] is 
> running beyond virtual memory limits. Current usage: 70.0 MB of 256 MB 
> physical memory used; 1.2 GB of 537.6 MB virtual memory used. Killing container.
> Dump of the process-tree for container_1442908447829_0001_01_000001 :
>         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>         |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908 
> /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server 
> -Dsamza.container.name=samza-application-master
> -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1442908447
> 829_0001/container_1442908447829_0001_01_000001
> -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache
> /application_1442908447829_0001/container_1442908447829_0001_01_000001
> /__package/tmp
> -Xmx768M -XX:+PrintGCDateStamps
> -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001
> /container_1442908447829_0001_01_000001/gc.log
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=10241024 -d64 -cp
> /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/ap
> pcache/application_1442908447829_0001/container_1442908447829_0001_01_
> 000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm
> -local-dir/usercache/root/appcache/application_1442908447829_0001/cont
> ainer_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.ja
> r:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_14
> 42908447829_0001/container_1442908447829_0001_01_000001/__package/lib/
> jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/roo
> t/appcache/application_1442908447829_0001/container_1442908447829_0001
> _01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoo
> p-root/nm-local-dir/usercache/root/appcache/application_1442908447829_
> 0001/container_1442908447829_0001_01_000001/__package/lib/jackson-jaxr
> s-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root
> /appcache/application_1442908447829_0001/container_1442908447829_0001_
> 01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/tmp
> /hadoop-root/nm-local-dir/usercache/root/appcache/application_14429084
> 47829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBro
> ker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/ap
> plication_1442908447829_0001/container_1442908447829_0001_01_000001/__
> package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar
> org.apache.samza.job.yarn.SamzaAppMaster
>
> 2015-09-22 09:54:36,663 INFO  [Container Monitor] 
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - 
> Removed ProcessTree with root 10346
> 2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler] 
> container.Container (ContainerImpl.java:handle(999)) - Container
> container_1442908447829_0001_01_000001 transitioned from RUNNING to 
> KILLING
> 2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler] 
> launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) 
> - Cleaning up container container_1442908447829_0001_01_000001
> ________________________________
> Jordi Blasi Uribarri
> Área I+D+i
>
> jblasi@nextel.es
> Oficina Bilbao
>
> [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
> ________________________________
> Jordi Blasi Uribarri
>

Re: container is running beyond virtual memory limits

Posted by Yi Pan <ni...@gmail.com>.
Hi, Jordi,

Can you post your task.opts settings as well? The Xms and Xmx JVM opts will
play a role here as well. The Xmx size should be set to less than
yarn.container.memory.mb.

-Yi

On Tue, Sep 22, 2015 at 4:32 AM, Jordi Blasi Uribarri <jb...@nextel.es>
wrote:

> I am seeing that I can not get even a single job running. I have recovered
> the original configuration of yarn-site.xml and capacity-scheduler.xml and
> that does not work. I am thinking that maybe there is some kind of
> information related to old jobs that have not been correctly cleaned when
> killing them. Is there any place where I can look to remove temporary files
> or something similar?
>
> Thanks
>
>         jordi
>
> -----Mensaje original-----
> De: Jordi Blasi Uribarri [mailto:jblasi@nextel.es]
> Enviado el: martes, 22 de septiembre de 2015 10:06
> Para: dev@samza.apache.org
> Asunto: container is running beyond virtual memory limits
>
> Hi,
>
> I am not really sure If this is related to any of the previous questions
> so I am asking it in a new message. I am running three different samza jobs
> that perform different actions and interchange information. As I found
> limits in the memory that were preventing the jobs to get from Accepted to
> Running I introduced some configurations in Yarn, as suggested in this list:
>
>
> yarn-site.xml
>
> <configuration>
>   <property>
>     <name>yarn.scheduler.minimum-allocation-mb</name>
>     <value>128</value>
>     <description>Minimum limit of memory to allocate to each container
> request at the Resource Manager.</description>
>   </property>
>   <property>
>     <name>yarn.scheduler.maximum-allocation-mb</name>
>     <value>512</value>
>     <description>Maximum limit of memory to allocate to each container
> request at the Resource Manager.</description>
>   </property>
>   <property>
>     <name>yarn.scheduler.minimum-allocation-vcores</name>
>     <value>1</value>
>     <description>The minimum allocation for every container request at the
> RM, in terms of virtual CPU cores. Requests lower than this won't take
> effect, and the specified value will get allocated the
> minimum.</description>
>   </property>
>   <property>
>     <name>yarn.scheduler.maximum-allocation-vcores</name>
>     <value>2</value>
>     <description>The maximum allocation for every container request at the
> RM, in terms of virtual CPU cores. Requests higher than this won't take
> effect, and will get capped to this value.</description>
>   </property>
> <property>
> <name>yarn.resourcemanager.hostname</name>
> <value>kfk-samza01</value>
> </property>
> </configuration>
>
> capacity-scheduler.xml
> Alter value
>     <property>
>     <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
>     <value>0.5</value>
>     <description>
>       Maximum percent of resources in the cluster which can be used to run
>       application masters i.e. controls number of concurrent running
>       applications.
>     </description>
>   </property>
>
> The jobs are configured to reduce the memory usage:
>
> yarn.container.memory.mb=256
> yarn.am.container.memory.mb=256
>
> After introducing these changes I experienced a very appreciable reduction
> of the speed. It seemed normal as the memory assigned to the jobs  was
> lowered and there were more running.  It was running until yesterday but
> today I am seeing that
>
> What I have seen today is that they are not moving from ACCEPTED to
> RUNNING. I have found the following in the log (full log at the end):
>
> 2015-09-22 09:54:36,661 INFO  [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
> Memory usage of ProcessTree 10346 for container-id
> container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical memory
> used; 1.2 GB of 537.6 MB virtual memory used
>
> I am not sure where that 1.2 Gb comes from and makes the processes dye.
>
> Thanks,
>
>    Jordi
>
>
>
>
> 2015-09-22 09:54:36,519 INFO  [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) -
> Removed ProcessTree with root 10271
> 2015-09-22 09:54:36,519 INFO  [AsyncDispatcher event handler]
> container.Container (ContainerImpl.java:handle(999)) - Container
> container_1442908447829_0002_01_000001 transitioned from RUNNING to KILLING
> 2015-09-22 09:54:36,533 INFO  [AsyncDispatcher event handler]
> launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) -
> Cleaning up container container_1442908447829_0002_01_000001
> 2015-09-22 09:54:36,661 INFO  [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
> Memory usage of ProcessTree 10346 for container-id
> container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical memory
> used; 1.2 GB of 537.6 MB virtual memory used
> 2015-09-22 09:54:36,661 WARN  [Container Monitor]
> monitor.ContainersMonitorImpl
> (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process tree for
> container: container_1442908447829_0001_01_000001 running over twice the
> configured limit. Limit=563714432, current usage = 1269743616
> 2015-09-22 09:54:36,662 WARN  [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447)) -
> Container [pid=10346,containerID=container_1442908447829_0001_01_000001] is
> running beyond virtual memory limits. Current usage: 70.0 MB of 256 MB
> physical memory used; 1.2 GB of 537.6 MB virtual memory used. Killing
> container.
> Dump of the process-tree for container_1442908447829_0001_01_000001 :
>         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>         |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908
> /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -Dsamza.container.name=samza-application-master
> -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001/container_1442908447829_0001_01_000001
> -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/tmp
> -Xmx768M -XX:+PrintGCDateStamps
> -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001/container_1442908447829_0001_01_000001/gc.log
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=10241024 -d64 -cp
> /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-jaxrs-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBroker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar
> org.apache.samza.job.yarn.SamzaAppMaster
>
> 2015-09-22 09:54:36,663 INFO  [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) -
> Removed ProcessTree with root 10346
> 2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler]
> container.Container (ContainerImpl.java:handle(999)) - Container
> container_1442908447829_0001_01_000001 transitioned from RUNNING to KILLING
> 2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler]
> launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) -
> Cleaning up container container_1442908447829_0001_01_000001
> ________________________________
> Jordi Blasi Uribarri
> Área I+D+i
>
> jblasi@nextel.es
> Oficina Bilbao
>
> [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
> ________________________________
> Jordi Blasi Uribarri
>

RE: container is running beyond virtual memory limits

Posted by Jordi Blasi Uribarri <jb...@nextel.es>.
I am seeing that I can not get even a single job running. I have recovered the original configuration of yarn-site.xml and capacity-scheduler.xml and that does not work. I am thinking that maybe there is some kind of information related to old jobs that have not been correctly cleaned when killing them. Is there any place where I can look to remove temporary files or something similar?

Thanks

	jordi

-----Mensaje original-----
De: Jordi Blasi Uribarri [mailto:jblasi@nextel.es] 
Enviado el: martes, 22 de septiembre de 2015 10:06
Para: dev@samza.apache.org
Asunto: container is running beyond virtual memory limits

Hi,

I am not really sure If this is related to any of the previous questions so I am asking it in a new message. I am running three different samza jobs that perform different actions and interchange information. As I found limits in the memory that were preventing the jobs to get from Accepted to Running I introduced some configurations in Yarn, as suggested in this list:


yarn-site.xml

<configuration>
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>128</value>
    <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>512</value>
    <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value>
    <description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>2</value>
    <description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
  </property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>kfk-samza01</value>
</property>
</configuration>

capacity-scheduler.xml
Alter value
    <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.5</value>
    <description>
      Maximum percent of resources in the cluster which can be used to run
      application masters i.e. controls number of concurrent running
      applications.
    </description>
  </property>

The jobs are configured to reduce the memory usage:

yarn.container.memory.mb=256
yarn.am.container.memory.mb=256

After introducing these changes I experienced a very appreciable reduction of the speed. It seemed normal as the memory assigned to the jobs  was lowered and there were more running.  It was running until yesterday but today I am seeing that

What I have seen today is that they are not moving from ACCEPTED to RUNNING. I have found the following in the log (full log at the end):

2015-09-22 09:54:36,661 INFO  [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 10346 for container-id container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical memory used; 1.2 GB of 537.6 MB virtual memory used

I am not sure where that 1.2 Gb comes from and makes the processes dye.

Thanks,

   Jordi




2015-09-22 09:54:36,519 INFO  [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - Removed ProcessTree with root 10271
2015-09-22 09:54:36,519 INFO  [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(999)) - Container container_1442908447829_0002_01_000001 transitioned from RUNNING to KILLING
2015-09-22 09:54:36,533 INFO  [AsyncDispatcher event handler] launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) - Cleaning up container container_1442908447829_0002_01_000001
2015-09-22 09:54:36,661 INFO  [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 10346 for container-id container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical memory used; 1.2 GB of 537.6 MB virtual memory used
2015-09-22 09:54:36,661 WARN  [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process tree for container: container_1442908447829_0001_01_000001 running over twice the configured limit. Limit=563714432, current usage = 1269743616
2015-09-22 09:54:36,662 WARN  [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447)) - Container [pid=10346,containerID=container_1442908447829_0001_01_000001] is running beyond virtual memory limits. Current usage: 70.0 MB of 256 MB physical memory used; 1.2 GB of 537.6 MB virtual memory used. Killing container.
Dump of the process-tree for container_1442908447829_0001_01_000001 :
        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
        |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -Dsamza.container.name=samza-application-master -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001/container_1442908447829_0001_01_000001 -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/tmp -Xmx768M -XX:+PrintGCDateStamps -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001/container_1442908447829_0001_01_000001/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10241024 -d64 -cp /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-jaxrs-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBroker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar org.apache.samza.job.yarn.SamzaAppMaster

2015-09-22 09:54:36,663 INFO  [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - Removed ProcessTree with root 10346
2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(999)) - Container container_1442908447829_0001_01_000001 transitioned from RUNNING to KILLING
2015-09-22 09:54:36,663 INFO  [AsyncDispatcher event handler] launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) - Cleaning up container container_1442908447829_0001_01_000001
________________________________
Jordi Blasi Uribarri
Área I+D+i

jblasi@nextel.es
Oficina Bilbao

[http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
________________________________
Jordi Blasi Uribarri