You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by 👌👌 <11...@qq.com> on 2019/11/21 09:26:15 UTC

回复: Memory profiling on Dataflow with java

I also can't save the heap dump,I find that no method do it.
So,I don't solve the problem until now.
Thanks for your answer!!!




------------------&nbsp;原始邮件&nbsp;------------------
发件人:&nbsp;"Frantisek Csajka"<csferi27@gmail.com&gt;;
发送时间:&nbsp;2019年11月21日(星期四) 下午5:18
收件人:&nbsp;"user"<user@beam.apache.org&gt;;

主题:&nbsp;Re: Memory profiling on Dataflow with java



Hi,

Have you succeeded saving a heap dump? I've also run into this a while ago and was not able to save a heap dump nor increase the boot disc size. If you have any update on this, could you please share?


Thanks in advance,
Frantisek


On Wed, Nov 20, 2019 at 1:46 AM Luke Cwik <lcwik@google.com&gt; wrote:

You might want to reach out to cloud support for help with debugging this and/or help with how to debug this.

On Mon, Nov 18, 2019 at 10:56 AM Jeff Klukas <jklukas@mozilla.com&gt; wrote:

On Mon, Nov 18, 2019 at 1:32 PM Reynaldo Baquerizo <reynaldo.micheline@bairesdev.com&gt; wrote:


Does it tell anything that the GCP console does not show the options --dumpHeapOnOOM --saveHeapDumpsToGcsPath of a running job under PipelineOptions (it does for diskSizeGb)?






That's normal; I also never saw those heap dump options display in the Dataflow UI. I think Dataflow doesn't show any options that originate from "Debug" options interfaces.



&nbsp;
On Mon, Nov 18, 2019 at 11:59 AM Jeff Klukas <jklukas@mozilla.com&gt; wrote:

Using default Dataflow workers, this is the set of options I passed:



--dumpHeapOnOOM --saveHeapDumpsToGcsPath=$MYBUCKET/heapdump --diskSizeGb=100




On Mon, Nov 18, 2019 at 11:57 AM Jeff Klukas <jklukas@mozilla.com&gt; wrote:

It sounds like you're generally doing the right thing. I've successfully used --saveHeapDumpsToGcsPath in a Java pipeline running on Dataflow and inspected the results in Eclipse MAT.


I think that --saveHeapDumpsToGcsPath will automatically turn on --dumpHeapOnOOM but worth setting that explicitly too.



Are your boot disks large enough to store the heap dumps? The docs for getSaveHeapDumpsToGcsPath [0] mention "CAUTION: This option implies dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps  can of comparable size to the default boot disk. Consider increasing the boot disk size before  setting this flag to true."


When I've done this in the past, I definitely had to increase boot disk size (though I forget now what the relevant Dataflow option was).



[0] https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.html


On Mon, Nov 18, 2019 at 11:35 AM Reynaldo Baquerizo <reynaldo.micheline@bairesdev.com&gt; wrote:

Hi all,


We are running into OOM&nbsp;issues with one of our pipelines. They are not reproducible with DirectRunner, only with Dataflow.
I tried --saveHeapDumpsToGcsPath, but it does not save any heap dump (MyOptions extends DataflowPipelineDebugOptions)
I looked at the java process inside the docker container and it has remote jmx enabled through port 5555, but outside traffic is firewalled.


Beam SDK: 2.15.0


Any&nbsp;ideas?


Cheers,
--
Reynaldo

回复: Memory profiling on Dataflow with java

Posted by 👌👌 <11...@qq.com>.
I use the beam run on spark.Because of i can not write udf for partitioner function,so i don't have ideas to slove it!!!




------------------&nbsp;原始邮件&nbsp;------------------
发件人:&nbsp;"👌👌"<1150693181@qq.com&gt;;
发送时间:&nbsp;2019年11月21日(星期四) 下午5:26
收件人:&nbsp;"user"<user@beam.apache.org&gt;;

主题:&nbsp;回复: Memory profiling on Dataflow with java



I also can't save the heap dump,I find that no method do it.
So,I don't solve the problem until now.
Thanks for your answer!!!




------------------ 原始邮件 ------------------
发件人:&nbsp;"Frantisek Csajka"<csferi27@gmail.com&gt;;
发送时间:&nbsp;2019年11月21日(星期四) 下午5:18
收件人:&nbsp;"user"<user@beam.apache.org&gt;;

主题:&nbsp;Re: Memory profiling on Dataflow with java



Hi,

Have you succeeded saving a heap dump? I've also run into this a while ago and was not able to save a heap dump nor increase the boot disc size. If you have any update on this, could you please share?


Thanks in advance,
Frantisek


On Wed, Nov 20, 2019 at 1:46 AM Luke Cwik <lcwik@google.com&gt; wrote:

You might want to reach out to cloud support for help with debugging this and/or help with how to debug this.

On Mon, Nov 18, 2019 at 10:56 AM Jeff Klukas <jklukas@mozilla.com&gt; wrote:

On Mon, Nov 18, 2019 at 1:32 PM Reynaldo Baquerizo <reynaldo.micheline@bairesdev.com&gt; wrote:


Does it tell anything that the GCP console does not show the options --dumpHeapOnOOM --saveHeapDumpsToGcsPath of a running job under PipelineOptions (it does for diskSizeGb)?






That's normal; I also never saw those heap dump options display in the Dataflow UI. I think Dataflow doesn't show any options that originate from "Debug" options interfaces.



&nbsp;
On Mon, Nov 18, 2019 at 11:59 AM Jeff Klukas <jklukas@mozilla.com&gt; wrote:

Using default Dataflow workers, this is the set of options I passed:



--dumpHeapOnOOM --saveHeapDumpsToGcsPath=$MYBUCKET/heapdump --diskSizeGb=100




On Mon, Nov 18, 2019 at 11:57 AM Jeff Klukas <jklukas@mozilla.com&gt; wrote:

It sounds like you're generally doing the right thing. I've successfully used --saveHeapDumpsToGcsPath in a Java pipeline running on Dataflow and inspected the results in Eclipse MAT.


I think that --saveHeapDumpsToGcsPath will automatically turn on --dumpHeapOnOOM but worth setting that explicitly too.



Are your boot disks large enough to store the heap dumps? The docs for getSaveHeapDumpsToGcsPath [0] mention "CAUTION: This option implies dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps  can of comparable size to the default boot disk. Consider increasing the boot disk size before  setting this flag to true."


When I've done this in the past, I definitely had to increase boot disk size (though I forget now what the relevant Dataflow option was).



[0] https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.html


On Mon, Nov 18, 2019 at 11:35 AM Reynaldo Baquerizo <reynaldo.micheline@bairesdev.com&gt; wrote:

Hi all,


We are running into OOM issues with one of our pipelines. They are not reproducible with DirectRunner, only with Dataflow.
I tried --saveHeapDumpsToGcsPath, but it does not save any heap dump (MyOptions extends DataflowPipelineDebugOptions)
I looked at the java process inside the docker container and it has remote jmx enabled through port 5555, but outside traffic is firewalled.


Beam SDK: 2.15.0


Any ideas?


Cheers,
--
Reynaldo