You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by sgg <sg...@gmail.com> on 2013/12/18 18:36:20 UTC

multiple Samza App Masters

Each time I run run-job.sh, I seem to be getting a new separate SamzaAppMaster.  That seems like a lot of overhead.

Is it somehow possible to have multiple samza jobs share the same SamzaAppMaster?

sgg

Re: multiple Samza App Masters

Posted by Chris Riccomini <cr...@linkedin.com>.
Hey sgg,

The default container memory is:

object YarnJob {
  val DEFAULT_AM_CONTAINER_MEM = 1024
}

The default -Xmx for the AM is 768.


This memory setting is a bit excessive, but not terribly so. When we were
setting Xmx in the 256m range, we were seeing heap space issues because of
the YARN AM's web UI dashboard (scalatra+scalate). The reason for the gap
between Xmx and the container size is to allow for some page cache for the
process, permgen, JVM overhead, etc.

With YARN, there's also a subtlety between the physical and virtual memory
allocations, which caused us to bump up the default memory for both the AM
and Samza containers. This issue is documented in some detail here:

  https://github.com/linkedin/hello-samza/issues/4


Cheers,
Chris

On 12/18/13 9:52 AM, "sgg" <sg...@gmail.com> wrote:

>Ok thanks Chris.  
>
>Is there a quick way to examine how much memory the App Masters are
>consuming?  What is the default setting for yarn.am.containter.memory.mb?
> How was that value determined?  Is it an "ample" setting or is it a bare
>minimum?
>
>sgg
>On Dec 18, 2013, at 12:42 PM, Chris Riccomini <cr...@linkedin.com>
>wrote:
>
>> Hey sgg,
>> 
>> Samza's model is one YARN AppMaster per-Samza job. This means, if you
>>run
>> two separate jobs, using run-job.sh, you'll end up with two Ams.
>> 
>> The overhead of the AM is really just memory (it's not CPU or disk
>> intensive), and this is adjustable using:
>> 
>>  yarn.am.opts
>>  yarn.am.container.memory.mb
>> 
>> There is no way to run multiple jobs from the same AM. If you are really
>> concerned about this, you can collapse your Samza job logic into a
>>single
>> job. You can even have the job talk to itself, if you need to
>>repartition
>> data (e.g. Have the output also be the input).
>> 
>> 
>> Cheers,
>> Chris
>> 
>> On 12/18/13 9:36 AM, "sgg" <sg...@gmail.com> wrote:
>> 
>>> Each time I run run-job.sh, I seem to be getting a new separate
>>> SamzaAppMaster.  That seems like a lot of overhead.
>>> 
>>> Is it somehow possible to have multiple samza jobs share the same
>>> SamzaAppMaster?
>>> 
>>> sgg
>> 
>


Re: multiple Samza App Masters

Posted by sgg <sg...@gmail.com>.
Ok thanks Chris.  

Is there a quick way to examine how much memory the App Masters are consuming?  What is the default setting for yarn.am.containter.memory.mb?  How was that value determined?  Is it an "ample" setting or is it a bare minimum?

sgg
On Dec 18, 2013, at 12:42 PM, Chris Riccomini <cr...@linkedin.com> wrote:

> Hey sgg,
> 
> Samza's model is one YARN AppMaster per-Samza job. This means, if you run
> two separate jobs, using run-job.sh, you'll end up with two Ams.
> 
> The overhead of the AM is really just memory (it's not CPU or disk
> intensive), and this is adjustable using:
> 
>  yarn.am.opts
>  yarn.am.container.memory.mb
> 
> There is no way to run multiple jobs from the same AM. If you are really
> concerned about this, you can collapse your Samza job logic into a single
> job. You can even have the job talk to itself, if you need to repartition
> data (e.g. Have the output also be the input).
> 
> 
> Cheers,
> Chris
> 
> On 12/18/13 9:36 AM, "sgg" <sg...@gmail.com> wrote:
> 
>> Each time I run run-job.sh, I seem to be getting a new separate
>> SamzaAppMaster.  That seems like a lot of overhead.
>> 
>> Is it somehow possible to have multiple samza jobs share the same
>> SamzaAppMaster?
>> 
>> sgg
> 


Re: multiple Samza App Masters

Posted by Chris Riccomini <cr...@linkedin.com>.
Hey sgg,

Samza's model is one YARN AppMaster per-Samza job. This means, if you run
two separate jobs, using run-job.sh, you'll end up with two Ams.

The overhead of the AM is really just memory (it's not CPU or disk
intensive), and this is adjustable using:

  yarn.am.opts
  yarn.am.container.memory.mb

There is no way to run multiple jobs from the same AM. If you are really
concerned about this, you can collapse your Samza job logic into a single
job. You can even have the job talk to itself, if you need to repartition
data (e.g. Have the output also be the input).


Cheers,
Chris

On 12/18/13 9:36 AM, "sgg" <sg...@gmail.com> wrote:

>Each time I run run-job.sh, I seem to be getting a new separate
>SamzaAppMaster.  That seems like a lot of overhead.
>
>Is it somehow possible to have multiple samza jobs share the same
>SamzaAppMaster?
>
>sgg