You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@samza.apache.org by Jordi Blasi Uribarri <jb...@nextel.es> on 2015/09/11 13:21:41 UTC

memory limits

Hi,

I am trying to implement an environment that requires multiple combined samza jobs for different tasks. I see that there is a limit to the number of jobs that can be running at the same time as they block 1GB of ram each. I understand that this is a reasonable limit in a production environment (as long as we are speaking of Big Data, we need big amounts of resources ☺ ) but my lab does not have so much ram. Is there a way to reduce this limit so I can test it properly? I am using Samza 0.9.

Thanks in advance,

   Jordi
________________________________
Jordi Blasi Uribarri
Área I+D+i

jblasi@nextel.es
Oficina Bilbao

[http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]

RE: memory limits

Posted by Mark Mindenhall <ma...@machineshop.io>.

I also had this issue, and it was resolved by changing settings in yarn-site.xml and capacity-scheduler.xml.  The amount of memory (and number of virtual CPUs) allocated to your jobs is controlled by settings in yarn-site.xml.  And I suspect you’re seeing jobs going into ACCEPTED instead of RUNNING due to the default value (0.1) of yarn.scheduler.capacity.maximum-am-resource-percent in capacity-scheduler.xml being too low.  For example, here are the values I’m using in my staging cluster (just two m3.medium EC2 instances), where I typically request 256MB per container.

-----------------------------------------------------------------------------
yarn-site.xml
-----------------------------------------------------------------------------

<?xml version="1.0"?>
<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>128</value>
    <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>512</value>
    <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value>
    <description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>2</value>
    <description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
  </property>

  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>3072</value>
    <description>Physical memory, in MB, to be made available to running containers</description>
  </property>
  <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>8</value>
    <description>Number of CPU cores that can be allocated for containers.</description>
  </property>

  <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>
  </property>
  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>

  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>******.amazonaws.com</value>
  </property>
  <property>
    <name>yarn.nodemanager.log.retain-seconds</name>
    <value>86400</value>
  </property>

</configuration>

-----------------------------------------------------------------------------
capacity-scheduler.xml
-----------------------------------------------------------------------------

<configuration>

  <property>
    <name>yarn.scheduler.capacity.maximum-applications</name>
    <value>10000</value>
    <description>
      Maximum number of applications that can be pending and running.
    </description>
  </property>

  <!-- Changed by MM from 0.1 (default) to 0.5, as our Samza jobs have typically
       just one AppMaster and one job container. -->
  <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.5</value>
    <description>
      Maximum percent of resources in the cluster which can be used to run
      application masters i.e. controls number of concurrent running
      applications.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    <description>
      The ResourceCalculator implementation to be used to compare
      Resources in the scheduler.
      The default i.e. DefaultResourceCalculator only uses Memory while
      DominantResourceCalculator uses dominant-resource to compare
      multi-dimensional resources such as Memory, CPU etc.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>100</value>
    <description>Default queue target capacity.</description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>100</value>
    <description>
      The maximum capacity of the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>*</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>*</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
      Number of missed scheduling opportunities after which the CapacityScheduler
      attempts to schedule rack-local containers.
      Typically this should be set to number of nodes in the cluster, By default is setting
      approximately number of nodes in one rack which is 40.
    </description>
  </property>
</configuration>




On September 15, 2015 at 5:05:28 AM, Jordi Blasi Uribarri (jblasi@nextel.es<ma...@nextel.es>) wrote:

I have tried changing all the jobs configuration to this:

yarn.container.memory.mb=128
yarn.am.container.memory.mb=128

and on the startup I can see:

2015-09-15 12:40:18 ClientHelper [INFO] set memory request to 128 for application_1442313590092_0002

On the web interface of hadoop I see that every job is still getting 2 gb each. In fact, only two of the jobs are in state running, while the rest are accepted.

Any ideas?

Thanks,

Jordi

-----Mensaje original-----
De: Yan Fang [mailto:yanfang724@gmail.com]
Enviado el: viernes, 11 de septiembre de 2015 20:56
Para: dev@samza.apache.org
Asunto: Re: memory limits

Hi Jordi,

I believe you can change the memory by* yarn.container.memory.mb* , default is 1024. And *yarn.am.container.memory.mb* is for the AM memory.

See
http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html

Thanks,
Fang, Yan
yanfang724@gmail.com

On Fri, Sep 11, 2015 at 4:21 AM, Jordi Blasi Uribarri <jb...@nextel.es>
wrote:

> Hi,
>
> I am trying to implement an environment that requires multiple
> combined samza jobs for different tasks. I see that there is a limit
> to the number of jobs that can be running at the same time as they block 1GB of ram each.
> I understand that this is a reasonable limit in a production
> environment (as long as we are speaking of Big Data, we need big
> amounts of resources ☺
> ) but my lab does not have so much ram. Is there a way to reduce this
> limit so I can test it properly? I am using Samza 0.9.
>
> Thanks in advance,
>
> Jordi
> ________________________________
> Jordi Blasi Uribarri
> Área I+D+i
>
> jblasi@nextel.es
> Oficina Bilbao
>
> [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
>

RE: memory limits

Posted by Jordi Blasi Uribarri <jb...@nextel.es>.

I have tried changing all the jobs configuration to this:

yarn.container.memory.mb=128
yarn.am.container.memory.mb=128

and on the startup I can see:

2015-09-15 12:40:18 ClientHelper [INFO] set memory request to 128 for application_1442313590092_0002

On the web interface of hadoop I see that every job is still getting 2 gb each. In fact, only two of the jobs are in state running, while the rest are accepted.

Any ideas?

Thanks,

    Jordi

-----Mensaje original-----
De: Yan Fang [mailto:yanfang724@gmail.com] 
Enviado el: viernes, 11 de septiembre de 2015 20:56
Para: dev@samza.apache.org
Asunto: Re: memory limits

Hi Jordi,

I believe you can change the memory by* yarn.container.memory.mb* , default is 1024. And *yarn.am.container.memory.mb* is for the AM memory.

See
http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html

Thanks,
Fang, Yan
yanfang724@gmail.com

On Fri, Sep 11, 2015 at 4:21 AM, Jordi Blasi Uribarri <jb...@nextel.es>
wrote:

> Hi,
>
> I am trying to implement an environment that requires multiple 
> combined samza jobs for different tasks. I see that there is a limit 
> to the number of jobs that can be running at the same time as they block 1GB of ram each.
> I understand that this is a reasonable limit in a production 
> environment (as long as we are speaking of Big Data, we need big 
> amounts of resources ☺
> ) but my lab does not have so much ram. Is there a way to reduce this 
> limit so I can test it properly? I am using Samza 0.9.
>
> Thanks in advance,
>
>    Jordi
> ________________________________
> Jordi Blasi Uribarri
> Área I+D+i
>
> jblasi@nextel.es
> Oficina Bilbao
>
> [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
>

Re: memory limits

Posted by Yan Fang <ya...@gmail.com>.

Hi Jordi,

I believe you can change the memory by* yarn.container.memory.mb* , default
is 1024. And *yarn.am.container.memory.mb* is for the AM memory.

See
http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html

Thanks,
Fang, Yan
yanfang724@gmail.com

On Fri, Sep 11, 2015 at 4:21 AM, Jordi Blasi Uribarri <jb...@nextel.es>
wrote:

> Hi,
>
> I am trying to implement an environment that requires multiple combined
> samza jobs for different tasks. I see that there is a limit to the number
> of jobs that can be running at the same time as they block 1GB of ram each.
> I understand that this is a reasonable limit in a production environment
> (as long as we are speaking of Big Data, we need big amounts of resources ☺
> ) but my lab does not have so much ram. Is there a way to reduce this limit
> so I can test it properly? I am using Samza 0.9.
>
> Thanks in advance,
>
>    Jordi
> ________________________________
> Jordi Blasi Uribarri
> Área I+D+i
>
> jblasi@nextel.es
> Oficina Bilbao
>
> [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
>