You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Michel Davit <mi...@spotify.com> on 2022/05/20 14:50:58 UTC

Dataflow runner and IntrinsicMapTaskExecutor instanciation

Hi there 👋

I have a question regarding dataflow runner, more precisely on its behavior
for instantiating IntrinsicMapTaskExecutor.

I've noticed that running the same job with different versions, and
analyzing the heap dump, there are some differences:
beam java SDK 2.29.0: *11* instances
beam java SDK 2.35.0: *45* instances
beam java SDK 2.35.0 with runner v2: *47* instances

On these test jobs, I run with only 1 worker of type n1-standard-1. In all
cases, only 4 task executors are started and assigned to a thread.

As creating a IntrinsicMapTaskExecutor is not 'free': it involves
duplicating the whole coder stack. We see a significant increase of memory
consumption as well as a longer startup time, due to the
CloudObjects.coderFromCloudObject operation.

Is there a reason why dataflow creates so many unused executors ?

Cheers
-- 



Michel Davit
Data Engineer
Spotify France | 54 Rue de Londres | 75008 Paris, France

Re: Dataflow runner and IntrinsicMapTaskExecutor instanciation

Posted by Luke Cwik <lc...@google.com>.
I believe there is an issue with how you are launching your Runner V2
pipelines as there should be no instances of the IntrinsicMapTaskExecutor
within the SDK worker.

With runner v2 there is still the process of creating something like it
though and creating coder instances as well.

On Fri, May 20, 2022 at 9:38 AM Robert Bradshaw <ro...@google.com> wrote:

> On Fri, May 20, 2022 at 7:51 AM Michel Davit <mi...@spotify.com> wrote:
> >
> > Hi there 👋
> >
> > I have a question regarding dataflow runner, more precisely on its
> behavior for instantiating IntrinsicMapTaskExecutor.
> >
> > I've noticed that running the same job with different versions, and
> analyzing the heap dump, there are some differences:
> > beam java SDK 2.29.0: 11 instances
> > beam java SDK 2.35.0: 45 instances
> > beam java SDK 2.35.0 with runner v2: 47 instances
> >
> > On these test jobs, I run with only 1 worker of type n1-standard-1. In
> all cases, only 4 task executors are started and assigned to a thread.
>
> In streaming the number of threads is generally much higher than the
> number of cores, and these get created on demand. Is your peak
> concurrency much lower?
>
> > As creating a IntrinsicMapTaskExecutor is not 'free': it involves
> duplicating the whole coder stack. We see a significant increase of memory
> consumption as well as a longer startup time, due to the
> CloudObjects.coderFromCloudObject operation.
>
> IIRC, we memoize coder creation on Runner v2 to cut down on this
> overhead. (Maybe that's just in Python.)
>
> > Is there a reason why dataflow creates so many unused executors ?
> >
> > Cheers
> > --
> >
> >
> >
> > Michel Davit
> > Data Engineer
> > Spotify France | 54 Rue de Londres | 75008 Paris, France
>

Re: Dataflow runner and IntrinsicMapTaskExecutor instanciation

Posted by Robert Bradshaw <ro...@google.com>.
On Fri, May 20, 2022 at 7:51 AM Michel Davit <mi...@spotify.com> wrote:
>
> Hi there 👋
>
> I have a question regarding dataflow runner, more precisely on its behavior for instantiating IntrinsicMapTaskExecutor.
>
> I've noticed that running the same job with different versions, and analyzing the heap dump, there are some differences:
> beam java SDK 2.29.0: 11 instances
> beam java SDK 2.35.0: 45 instances
> beam java SDK 2.35.0 with runner v2: 47 instances
>
> On these test jobs, I run with only 1 worker of type n1-standard-1. In all cases, only 4 task executors are started and assigned to a thread.

In streaming the number of threads is generally much higher than the
number of cores, and these get created on demand. Is your peak
concurrency much lower?

> As creating a IntrinsicMapTaskExecutor is not 'free': it involves duplicating the whole coder stack. We see a significant increase of memory consumption as well as a longer startup time, due to the CloudObjects.coderFromCloudObject operation.

IIRC, we memoize coder creation on Runner v2 to cut down on this
overhead. (Maybe that's just in Python.)

> Is there a reason why dataflow creates so many unused executors ?
>
> Cheers
> --
>
>
>
> Michel Davit
> Data Engineer
> Spotify France | 54 Rue de Londres | 75008 Paris, France