You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Sahith Nallapareddy via dev <de...@beam.apache.org> on 2023/06/01 14:56:09 UTC

Potentially Memory Leak in Dataflow using External transform

Hello,

I am reporting an issue I am seeing currently that seems to indicate maybe
a memory leak using external transforms, especially with large amounts of
data going through the transform itself. I have a google support ticket
open for about a month now, but it seems that it has not been moving
forward at all. I ran two batch jobs in dataflow, one with the external
transform, and one with external transform copied into the job and calling
buildExternal on it. The jobs are identical in every aspect besides the
external call. The memory however continually rises in the job with
external transform. I am attaching the images of the memory charts. My case
number is Case 44543721. Usually these jobs write to Bigtable, but maybe
due to this issue, I found the job with the external transform to start
deadlocking and get stuck showing the warning "Ptransform not outtputting".
Unfortunately, due to cost reasons I did not want to replicate that job
exactly but I am not sure if that is related or not (if this memory leak
causes some sort of resource contention/deadlock).

Thanks,

Sahith
[image: Screen Shot 2023-06-01 at 10.53.08 AM.png][image: Screen Shot
2023-06-01 at 10.53.31 AM.png]

Re: Potentially Memory Leak in Dataflow using External transform

Posted by Sahith Nallapareddy via dev <de...@beam.apache.org>.
Hello,

Thank you I will try that. Would you still expect it to rise like that over
the course of a job? Unfortunately, the original job that was running into
deadlocking I dont have anything anymore as it expired in Dataflow but we
did try using high machines there (we tried using dataflow prime with high
min ram set, and tried setting the worker machine to a larger one) but it
seemed to keep going up to hit the limit. I think on Monday ill just try
rerunning the job that actually writes to Bigtable so I can see that
behavior again. Unfortunately, the logging of the "PTransform not
outputting" cost... a lot of money that is why I avoided it, but it seems
relevant to the case to get solved.

Thanks,

Sahith

On Thu, Jun 1, 2023 at 5:39 PM Chamikara Jayalath <ch...@google.com>
wrote:

> Using the external transform will startup additional containers in each
> VM. This could be the reason for OOMs (not necessarily a memory leak) but
>  I haven't looked at your specific job. Have you tried running with higher
> memory VMs or tried using the "--experiment=no_use_multiple_sdk_containers"
> option ?
>
> Given that you have a GCP support ticket, it's probably good to
> communicate there (so that folks can actually look at your job with correct
> permissions).
>
> Thanks,
> Cham
>
> On Thu, Jun 1, 2023 at 8:03 AM Svetak Sundhar via dev <de...@beam.apache.org>
> wrote:
>
>> +Bruno Volpato <bv...@google.com> who is looking into this issue
>> internally.
>>
>>
>> Svetak Sundhar
>>
>>   Data Engineer
>> s <ne...@google.com>vetaksundhar@google.com
>>
>>
>>
>> On Thu, Jun 1, 2023 at 10:56 AM Sahith Nallapareddy via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hello,
>>>
>>> I am reporting an issue I am seeing currently that seems to indicate
>>> maybe a memory leak using external transforms, especially with large
>>> amounts of data going through the transform itself. I have a google support
>>> ticket open for about a month now, but it seems that it has not been moving
>>> forward at all. I ran two batch jobs in dataflow, one with the external
>>> transform, and one with external transform copied into the job and calling
>>> buildExternal on it. The jobs are identical in every aspect besides the
>>> external call. The memory however continually rises in the job with
>>> external transform. I am attaching the images of the memory charts. My case
>>> number is Case 44543721. Usually these jobs write to Bigtable, but maybe
>>> due to this issue, I found the job with the external transform to start
>>> deadlocking and get stuck showing the warning "Ptransform not outtputting".
>>> Unfortunately, due to cost reasons I did not want to replicate that job
>>> exactly but I am not sure if that is related or not (if this memory leak
>>> causes some sort of resource contention/deadlock).
>>>
>>> Thanks,
>>>
>>> Sahith
>>> [image: Screen Shot 2023-06-01 at 10.53.08 AM.png][image: Screen Shot
>>> 2023-06-01 at 10.53.31 AM.png]
>>>
>>

Re: Potentially Memory Leak in Dataflow using External transform

Posted by Chamikara Jayalath via dev <de...@beam.apache.org>.
Using the external transform will startup additional containers in each VM.
This could be the reason for OOMs (not necessarily a memory leak) but  I
haven't looked at your specific job. Have you tried running with higher
memory VMs or tried using the "--experiment=no_use_multiple_sdk_containers"
option ?

Given that you have a GCP support ticket, it's probably good to communicate
there (so that folks can actually look at your job with correct
permissions).

Thanks,
Cham

On Thu, Jun 1, 2023 at 8:03 AM Svetak Sundhar via dev <de...@beam.apache.org>
wrote:

> +Bruno Volpato <bv...@google.com> who is looking into this issue
> internally.
>
>
> Svetak Sundhar
>
>   Data Engineer
> s <ne...@google.com>vetaksundhar@google.com
>
>
>
> On Thu, Jun 1, 2023 at 10:56 AM Sahith Nallapareddy via dev <
> dev@beam.apache.org> wrote:
>
>> Hello,
>>
>> I am reporting an issue I am seeing currently that seems to indicate
>> maybe a memory leak using external transforms, especially with large
>> amounts of data going through the transform itself. I have a google support
>> ticket open for about a month now, but it seems that it has not been moving
>> forward at all. I ran two batch jobs in dataflow, one with the external
>> transform, and one with external transform copied into the job and calling
>> buildExternal on it. The jobs are identical in every aspect besides the
>> external call. The memory however continually rises in the job with
>> external transform. I am attaching the images of the memory charts. My case
>> number is Case 44543721. Usually these jobs write to Bigtable, but maybe
>> due to this issue, I found the job with the external transform to start
>> deadlocking and get stuck showing the warning "Ptransform not outtputting".
>> Unfortunately, due to cost reasons I did not want to replicate that job
>> exactly but I am not sure if that is related or not (if this memory leak
>> causes some sort of resource contention/deadlock).
>>
>> Thanks,
>>
>> Sahith
>> [image: Screen Shot 2023-06-01 at 10.53.08 AM.png][image: Screen Shot
>> 2023-06-01 at 10.53.31 AM.png]
>>
>

Re: Potentially Memory Leak in Dataflow using External transform

Posted by Svetak Sundhar via dev <de...@beam.apache.org>.
+Bruno Volpato <bv...@google.com> who is looking into this issue
internally.


Svetak Sundhar

  Data Engineer
s <ne...@google.com>vetaksundhar@google.com



On Thu, Jun 1, 2023 at 10:56 AM Sahith Nallapareddy via dev <
dev@beam.apache.org> wrote:

> Hello,
>
> I am reporting an issue I am seeing currently that seems to indicate maybe
> a memory leak using external transforms, especially with large amounts of
> data going through the transform itself. I have a google support ticket
> open for about a month now, but it seems that it has not been moving
> forward at all. I ran two batch jobs in dataflow, one with the external
> transform, and one with external transform copied into the job and calling
> buildExternal on it. The jobs are identical in every aspect besides the
> external call. The memory however continually rises in the job with
> external transform. I am attaching the images of the memory charts. My case
> number is Case 44543721. Usually these jobs write to Bigtable, but maybe
> due to this issue, I found the job with the external transform to start
> deadlocking and get stuck showing the warning "Ptransform not outtputting".
> Unfortunately, due to cost reasons I did not want to replicate that job
> exactly but I am not sure if that is related or not (if this memory leak
> causes some sort of resource contention/deadlock).
>
> Thanks,
>
> Sahith
> [image: Screen Shot 2023-06-01 at 10.53.08 AM.png][image: Screen Shot
> 2023-06-01 at 10.53.31 AM.png]
>