You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Avihai Berkovitz <Av...@microsoft.com> on 2016/12/13 16:41:36 UTC

Standalone cluster layout

Hi folks,

I am setting up a Flink cluster for testing, and I have a few questions regarding memory allocations:

  1.  Is there a recommended limit to the size of a TaskManager heap? I saw that Flink uses G1GC, so we can use dozens of GB.
  2.  Following the above question, should I use only one TaskManager process per machine, and give it all the available memory (minus a couple of GB for the OS)?
  3.  Should I reserve some memory for RocksDB? The partitioned state will be around 500GB in size, and to my understanding RocksDB runs in native code and so uses off-heap memory.
  4.  What is the recommended heap size of a JobManager? I expect that the cluster will run only 2 jobs at the same time.

The planned layout of the standalone cluster is:

  *   3 small JobManager machines, running:
     *   1 process of Zookeeper peer
     *   1 JobManager process
  *   N large TaskManager machines, each running 1 TM process

Thanks!
Avihai

RE: Standalone cluster layout

Posted by Avihai Berkovitz <Av...@microsoft.com>.

Thank you for the answers. The cluster will run in Azure, so I will be using HDFS over Azure Blob Store, as outlined in http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Azure-Blob-Storage-Connector-td8536.html
I got pretty good performance in my tests, and it should handle scaling well. We will see how it performs under real production loads.

From: Robert Metzger [mailto:rmetzger@apache.org]
Sent: Wednesday, December 14, 2016 4:59 PM
To: user@flink.apache.org
Subject: Re: Standalone cluster layout

Hi Avihai,

1. As much as possible (I would leave the operating system at least 1 GB of memory). It depends also on the workload you have. For streaming workload with very small state, you can use Flink with 1-2 GB of heap space and still get very good performance.
2. Yes, I would recommend to run one large Taskmanager per machine, because you save on "management overhead" and you benefit from faster data transfers locally.
3. If you give your Taskmanagers say 10 GB of heap, its likely that the process in the OS is using ~12 GB of memory in total (our network stack is also using some offheap memory). You can fine-tune the (memory) behavior of Rocks, but by default its not using a lot of memory.

4. I would give it at least 2 GB, if you run multiple jobs or larger jobs (high parallelism, many machines, many tasks), than maybe even more.

The layout of the standalone cluster looks good.
Where are you planning to write the state checkpoints to? Given that you have 500 Gb of state, you should consider how you want to store that state somewhere reliably. For larger states, its recommended to have a good network connection between the workers (machines running TMs) and the distributed file system (say S3, HDFS, ...).

On Tue, Dec 13, 2016 at 5:41 PM, Avihai Berkovitz <Av...@microsoft.com>> wrote:
Hi folks,

I am setting up a Flink cluster for testing, and I have a few questions regarding memory allocations:

  1.  Is there a recommended limit to the size of a TaskManager heap? I saw that Flink uses G1GC, so we can use dozens of GB.
  2.  Following the above question, should I use only one TaskManager process per machine, and give it all the available memory (minus a couple of GB for the OS)?
  3.  Should I reserve some memory for RocksDB? The partitioned state will be around 500GB in size, and to my understanding RocksDB runs in native code and so uses off-heap memory.
  4.  What is the recommended heap size of a JobManager? I expect that the cluster will run only 2 jobs at the same time.

The planned layout of the standalone cluster is:

  *   3 small JobManager machines, running:

     *   1 process of Zookeeper peer
     *   1 JobManager process

  *   N large TaskManager machines, each running 1 TM process

Thanks!
Avihai

Re: Standalone cluster layout

Posted by Robert Metzger <rm...@apache.org>.

Hi Avihai,

1. As much as possible (I would leave the operating system at least 1 GB of
memory). It depends also on the workload you have. For streaming workload
with very small state, you can use Flink with 1-2 GB of heap space and
still get very good performance.
2. Yes, I would recommend to run one large Taskmanager per machine, because
you save on "management overhead" and you benefit from faster data
transfers locally.
3. If you give your Taskmanagers say 10 GB of heap, its likely that the
process in the OS is using ~12 GB of memory in total (our network stack is
also using some offheap memory). You can fine-tune the (memory) behavior of
Rocks, but by default its not using a lot of memory.

4. I would give it at least 2 GB, if you run multiple jobs or larger jobs
(high parallelism, many machines, many tasks), than maybe even more.

The layout of the standalone cluster looks good.
Where are you planning to write the state checkpoints to? Given that you
have 500 Gb of state, you should consider how you want to store that state
somewhere reliably. For larger states, its recommended to have a good
network connection between the workers (machines running TMs) and the
distributed file system (say S3, HDFS, ...).

On Tue, Dec 13, 2016 at 5:41 PM, Avihai Berkovitz <
Avihai.Berkovitz@microsoft.com> wrote:

> Hi folks,
>
>
>
> I am setting up a Flink cluster for testing, and I have a few questions
> regarding memory allocations:
>
>    1. Is there a recommended limit to the size of a TaskManager heap? I
>    saw that Flink uses G1GC, so we can use dozens of GB.
>    2. Following the above question, should I use only one TaskManager
>    process per machine, and give it all the available memory (minus a couple
>    of GB for the OS)?
>    3. Should I reserve some memory for RocksDB? The partitioned state
>    will be around 500GB in size, and to my understanding RocksDB runs in
>    native code and so uses off-heap memory.
>    4. What is the recommended heap size of a JobManager? I expect that
>    the cluster will run only 2 jobs at the same time.
>
>
>
> The planned layout of the standalone cluster is:
>
>    - 3 small JobManager machines, running:
>       - 1 process of Zookeeper peer
>       - 1 JobManager process
>    - N large TaskManager machines, each running 1 TM process
>
>
>
> Thanks!
>
> Avihai
>
>
>