You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by 刘建刚 <li...@gmail.com> on 2019/11/20 07:08:20 UTC

How to estimate the memory size of flink state

      We are using flink 1.6.2. For filesystem backend, we want to monitor
the state size in memory. Once the state size becomes bigger, we can get
noticed and take measures such as rescaling the job, or the job may fail
because of the memory.
      We have tried to get the memory usage for the jvm, like gc throughput.
For our case, state can vary greatly at the peak. So maybe I can refer to
the state memory size.
      I checked the metrics and code, but didn't find any information about
the state memory size. I can get the checkpoint size, but they are
serialized result that can not reflect the running state in memory.  Can
anyone give me some suggestions? Thank you very much.

Re: How to estimate the memory size of flink state

Posted by 刘建刚 <li...@gmail.com>.
      Thank you. Your suggestion is good and I benefit a lot. For my case, I want to know the state memory size for other reasons. 
      When the the gc pressure is bigger, I need to limit the source or discard some data from the source to ensure job’s running. If the state size is bigger, I need to discard data. If the state size is not bigger, I need to limit the source.  The state size shows the resident memory. For event time, discarding data can reduce memory usage.
      Could you please give me some suggestions? 

> 在 2019年11月20日,下午3:16,sysukelee <sy...@gmail.com> 写道:
> 
> Hi Liu,
> We monitor the jvm used/max heap memory to determine whether to rescale the job.
> To avoid problems caused by oom, you don't need to know exactly how much memory exactly used by state. 
> Focusing on jvm memory use is more reasonable.
>  <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=sysukelee&uid=sysukelee%40gmail.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22sysukelee%40gmail.com%22%5D>
> On 11/20/2019 15:08,刘建刚<li...@gmail.com> <ma...@gmail.com> wrote: 
> We are using flink 1.6.2. For filesystem backend, we want to monitor
> the state size in memory. Once the state size becomes bigger, we can get
> noticed and take measures such as rescaling the job, or the job may fail
> because of the memory.
> We have tried to get the memory usage for the jvm, like gc throughput.
> For our case, state can vary greatly at the peak. So maybe I can refer to
> the state memory size.
> I checked the metrics and code, but didn't find any information about
> the state memory size. I can get the checkpoint size, but they are
> serialized result that can not reflect the running state in memory.  Can
> anyone give me some suggestions? Thank you very much.


Re: How to estimate the memory size of flink state

Posted by 刘建刚 <li...@gmail.com>.
      Thank you. Your suggestion is good and I benefit a lot. For my case, I want to know the state memory size for other reasons. 
      When the the gc pressure is bigger, I need to limit the source or discard some data from the source to ensure job’s running. If the state size is bigger, I need to discard data. If the state size is not bigger, I need to limit the source.  The state size shows the resident memory. For event time, discarding data can reduce memory usage.
      Could you please give me some suggestions? 

> 在 2019年11月20日,下午3:16,sysukelee <sy...@gmail.com> 写道:
> 
> Hi Liu,
> We monitor the jvm used/max heap memory to determine whether to rescale the job.
> To avoid problems caused by oom, you don't need to know exactly how much memory exactly used by state. 
> Focusing on jvm memory use is more reasonable.
>  <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=sysukelee&uid=sysukelee%40gmail.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22sysukelee%40gmail.com%22%5D>
> On 11/20/2019 15:08,刘建刚<li...@gmail.com> <ma...@gmail.com> wrote: 
> We are using flink 1.6.2. For filesystem backend, we want to monitor
> the state size in memory. Once the state size becomes bigger, we can get
> noticed and take measures such as rescaling the job, or the job may fail
> because of the memory.
> We have tried to get the memory usage for the jvm, like gc throughput.
> For our case, state can vary greatly at the peak. So maybe I can refer to
> the state memory size.
> I checked the metrics and code, but didn't find any information about
> the state memory size. I can get the checkpoint size, but they are
> serialized result that can not reflect the running state in memory.  Can
> anyone give me some suggestions? Thank you very much.


Re: How to estimate the memory size of flink state

Posted by 刘建刚 <li...@gmail.com>.
      Thank you. Your suggestion is good and I benefit a lot. For my case, I want to know the state memory size for other reasons. 
      When the the gc pressure is bigger, I need to limit the source or discard some data from the source to ensure job’s running. If the state size is bigger, I need to discard data. If the state size is not bigger, I need to limit the source.  The state size shows the resident memory. For event time, discarding data can reduce memory usage.
      Could you please give me some suggestions? 

> 在 2019年11月20日,下午3:16,sysukelee <sy...@gmail.com> 写道:
> 
> Hi Liu,
> We monitor the jvm used/max heap memory to determine whether to rescale the job.
> To avoid problems caused by oom, you don't need to know exactly how much memory exactly used by state. 
> Focusing on jvm memory use is more reasonable.
>  <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=sysukelee&uid=sysukelee%40gmail.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22sysukelee%40gmail.com%22%5D>
> On 11/20/2019 15:08,刘建刚<li...@gmail.com> <ma...@gmail.com> wrote: 
> We are using flink 1.6.2. For filesystem backend, we want to monitor
> the state size in memory. Once the state size becomes bigger, we can get
> noticed and take measures such as rescaling the job, or the job may fail
> because of the memory.
> We have tried to get the memory usage for the jvm, like gc throughput.
> For our case, state can vary greatly at the peak. So maybe I can refer to
> the state memory size.
> I checked the metrics and code, but didn't find any information about
> the state memory size. I can get the checkpoint size, but they are
> serialized result that can not reflect the running state in memory.  Can
> anyone give me some suggestions? Thank you very much.