You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by TJ Giuli <tg...@skyportsystems.com> on 2014/02/11 08:12:29 UTC

Samza job memory requirements

Folks, does anyone have experience they can share regarding memory allocation for Samza tasks?  Out of the box, it looks like the ApplicationManager defaults to 1GB of RAM for its container and 1GB per YARN container for each TaskRunner.  

Some of my Samza tasks are pretty simple and (I think) use very little runtime memory per partition — essentially following a pattern of read message, process, commit result to a database or a stream output, repeat.  For these kinds of tasks, I’m assuming I can safely scale down the container memory bounds.  What about ApplicationManager?  Does it need a full GB per Samza task?  Thanks!
—T

Re: Samza job memory requirements

Posted by TJ Giuli <tg...@skyportsystems.com>.
Great, Chris, thanks for the advice!
—T

On Feb 11, 2014, at 9:04 AM, Chris Riccomini <cr...@linkedin.com> wrote:

> Hey TJ,
> 
> For small containers, you can definitely drop the memory usage. There are
> several things to be aware of when doing this:
> 
> 1. YARN extrapolates virtual memory allocation as a multiple of your
> physical memory (2.1x, by default, if memory serves correct). This means a
> 1G container will give you 2.1G of VM. If you drop the 1G container size,
> you're also dropping the VM size as well, as a result.
> 2. If your task interacts with disk, you should consider the OS page
> cache, and how much memory you'd like to have. For example, your JVM and
> heap might only use 256M, but you might want the full gig at the container
> level in order to give yourself 768M of page cache for disk IO.
> 3. In practice, going below 256MB on Xmx, and 384MB for
> yarn.container.memory.mb is pretty hard to get right.
> 4. If your job is processing a high throughput stream, you might end up
> using a lot of memory usage in your eden space even if your task is
> totally stateless. In these scenarios, it is really helpful to use CMS,
> and increase the young gen size.
> 
> The AM actually uses a fair amount of memory because of the dashboard,
> which uses Scalatra and Scalate. These two guys end up chewing through a
> lot of memory when you view the dashboard in YARN. We were running the
> yarn container size at 768MB, and still seeing the NM kill the jobs
> occasionally. I'd recommend leaving the AM as it is, unless you're really
> pressed for memory in your YARN grid.
> 
> Cheers,
> Chris
> 
> On 2/10/14 11:12 PM, "TJ Giuli" <tg...@skyportsystems.com> wrote:
> 
>> Folks, does anyone have experience they can share regarding memory
>> allocation for Samza tasks?  Out of the box, it looks like the
>> ApplicationManager defaults to 1GB of RAM for its container and 1GB per
>> YARN container for each TaskRunner.
>> 
>> Some of my Samza tasks are pretty simple and (I think) use very little
>> runtime memory per partition ‹ essentially following a pattern of read
>> message, process, commit result to a database or a stream output, repeat.
>> For these kinds of tasks, I¹m assuming I can safely scale down the
>> container memory bounds.  What about ApplicationManager?  Does it need a
>> full GB per Samza task?  Thanks!
>> ‹T
> 


Re: Samza job memory requirements

Posted by Chris Riccomini <cr...@linkedin.com>.
Hey TJ,

For small containers, you can definitely drop the memory usage. There are
several things to be aware of when doing this:

1. YARN extrapolates virtual memory allocation as a multiple of your
physical memory (2.1x, by default, if memory serves correct). This means a
1G container will give you 2.1G of VM. If you drop the 1G container size,
you're also dropping the VM size as well, as a result.
2. If your task interacts with disk, you should consider the OS page
cache, and how much memory you'd like to have. For example, your JVM and
heap might only use 256M, but you might want the full gig at the container
level in order to give yourself 768M of page cache for disk IO.
3. In practice, going below 256MB on Xmx, and 384MB for
yarn.container.memory.mb is pretty hard to get right.
4. If your job is processing a high throughput stream, you might end up
using a lot of memory usage in your eden space even if your task is
totally stateless. In these scenarios, it is really helpful to use CMS,
and increase the young gen size.

The AM actually uses a fair amount of memory because of the dashboard,
which uses Scalatra and Scalate. These two guys end up chewing through a
lot of memory when you view the dashboard in YARN. We were running the
yarn container size at 768MB, and still seeing the NM kill the jobs
occasionally. I'd recommend leaving the AM as it is, unless you're really
pressed for memory in your YARN grid.

Cheers,
Chris

On 2/10/14 11:12 PM, "TJ Giuli" <tg...@skyportsystems.com> wrote:

>Folks, does anyone have experience they can share regarding memory
>allocation for Samza tasks?  Out of the box, it looks like the
>ApplicationManager defaults to 1GB of RAM for its container and 1GB per
>YARN container for each TaskRunner.
>
>Some of my Samza tasks are pretty simple and (I think) use very little
>runtime memory per partition ‹ essentially following a pattern of read
>message, process, commit result to a database or a stream output, repeat.
> For these kinds of tasks, I¹m assuming I can safely scale down the
>container memory bounds.  What about ApplicationManager?  Does it need a
>full GB per Samza task?  Thanks!
>‹T