You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by David Morin <mo...@gmail.com> on 2020/02/22 17:11:25 UTC

yarn session: one JVM per task

Hi,
My app is based on a lib that is not thread safe (yet...).
In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ?
Context: I use one Yarn session and send my Flink jobs to this session

Regards,
David

Re: yarn session: one JVM per task

Posted by David Morin <mo...@gmail.com>.
Perfect.
No problem. My Bad. Not really clear.
Thanks !

Le mar. 25 févr. 2020 à 13:45, Xintong Song <to...@gmail.com> a
écrit :

> Ah, I misunderstood and thought that you want to keep all your Sink
> instances on the same TM.
>
> If what you want is to have one instance per TM, then as Gary mentioned
> specifying "-s 1" at starting the session would be enough, and it should
> work with all existing versions above (including) 1.8.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Feb 25, 2020 at 7:41 PM David Morin <mo...@gmail.com>
> wrote:
>
>> Hi Gary,
>>
>> Sorry I was probably not very clear.
>> Yes that's exactly what I want to hear :)
>> I use the -s 1 parameter and what I expect to have is one task of my Sink
>> (one instance in fact) per TM (i.e. per JVM)
>> That's the current behaviour during my tests but I want to be sure.
>> Thanks a lot
>>
>> David
>>
>> Le mar. 25 févr. 2020 à 11:16, Gary Yao <ga...@apache.org> a écrit :
>>
>>> Hi David,
>>>
>>> Before with the both n and -s it was not the case.
>>>>
>>>
>>> What do you mean by before? At least in 1.8 "-s" could be used to
>>> specify the
>>> number of slots per TM.
>>>
>>>
>>> how can I be sure that my Sink that uses this lib is in one JVM ?
>>>>
>>>
>>> Is it enough that no other parallel instance of your sink runs in the
>>> same
>>> JVM? If that is the case, it is enough to start your your YARN session
>>> with:
>>>
>>>     ./bin/yarn-session.sh -s 1 [...]
>>>
>>> This will result in exactly one slot per TM. Note that a single slot may
>>> still
>>> hold several subtasks of the job (Slot Sharing) but never two parallel
>>> instances of your sink [2]. You can also control Slot Sharing manually
>>> [3].
>>>
>>>
>>> So, if I understand I have to keep this Flink release (1.9.2) ?
>>>>
>>>
>>> I don't see why 1.10.0 would not work for you.
>>>
>>>
>>> Best,
>>> Gary
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session
>>> [2]
>>> https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources
>>> [3]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups
>>>
>>> On Tue, Feb 25, 2020 at 10:28 AM David Morin <mo...@gmail.com>
>>> wrote:
>>>
>>>> Hi Xintong,
>>>>
>>>> At the moment I'm using the 1.9.2 with this command:
>>>>    yarn-session.sh -d *-s 1* -jm 4096 -tm 4096 -qu "XXX" -nm
>>>> "MyPipeline"
>>>> So, after a lot of tests, I've noticed that if I increase the
>>>> parallelism of my Custom Sink, each task is embedded into one TS and, the
>>>> most important, each one into one TaskManager (Yarn container in fact).
>>>> So, if I understand I have to keep this Flink release (1.9.2) ?
>>>>
>>>> Thanks
>>>> David
>>>>
>>>>
>>>>
>>>> Le mar. 25 févr. 2020 à 02:02, Xintong Song <to...@gmail.com> a
>>>> écrit :
>>>>
>>>>> Depending on your Flink version, the '-n' option might not take
>>>>> effect. It is removed in the latest release, but before that there were a
>>>>> few versions where this option is neither removed nor taking effect.
>>>>>
>>>>> Anyway, as long as you have multiple containers, I don't think there's
>>>>> a way to make some of the tasks scheduled to the same JVM. Not that I'm
>>>>> aware of.
>>>>>
>>>>>
>>>>> Thank you~
>>>>>
>>>>> Xintong Song
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 24, 2020 at 8:43 PM David Morin <mo...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Thanks Xintong.
>>>>>> I've noticed than when I use yarn-session.sh with --slots (-s)
>>>>>> parameter but without --container (-n) it creates one task/slot per
>>>>>> taskmanager. Before with the both n and -s it was not the case.
>>>>>> I prefer to use only small container with only one task to scale my
>>>>>> pipeline and of course to prevent from thread-safe issue
>>>>>> Do you think I cannot be confident on that behaviour ?
>>>>>>
>>>>>> Regards,
>>>>>> David
>>>>>>
>>>>>> On 2020/02/22 17:11:25, David Morin <mo...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi,
>>>>>> > My app is based on a lib that is not thread safe (yet...).
>>>>>> > In waiting of the patch has been pushed, how can I be sure that my
>>>>>> Sink that uses this lib is in one JVM ?
>>>>>> > Context: I use one Yarn session and send my Flink jobs to this
>>>>>> session
>>>>>> >
>>>>>> > Regards,
>>>>>> > David
>>>>>> >
>>>>>>
>>>>>

Re: yarn session: one JVM per task

Posted by Xintong Song <to...@gmail.com>.
Ah, I misunderstood and thought that you want to keep all your Sink
instances on the same TM.

If what you want is to have one instance per TM, then as Gary mentioned
specifying "-s 1" at starting the session would be enough, and it should
work with all existing versions above (including) 1.8.

Thank you~

Xintong Song



On Tue, Feb 25, 2020 at 7:41 PM David Morin <mo...@gmail.com>
wrote:

> Hi Gary,
>
> Sorry I was probably not very clear.
> Yes that's exactly what I want to hear :)
> I use the -s 1 parameter and what I expect to have is one task of my Sink
> (one instance in fact) per TM (i.e. per JVM)
> That's the current behaviour during my tests but I want to be sure.
> Thanks a lot
>
> David
>
> Le mar. 25 févr. 2020 à 11:16, Gary Yao <ga...@apache.org> a écrit :
>
>> Hi David,
>>
>> Before with the both n and -s it was not the case.
>>>
>>
>> What do you mean by before? At least in 1.8 "-s" could be used to specify
>> the
>> number of slots per TM.
>>
>>
>> how can I be sure that my Sink that uses this lib is in one JVM ?
>>>
>>
>> Is it enough that no other parallel instance of your sink runs in the same
>> JVM? If that is the case, it is enough to start your your YARN session
>> with:
>>
>>     ./bin/yarn-session.sh -s 1 [...]
>>
>> This will result in exactly one slot per TM. Note that a single slot may
>> still
>> hold several subtasks of the job (Slot Sharing) but never two parallel
>> instances of your sink [2]. You can also control Slot Sharing manually
>> [3].
>>
>>
>> So, if I understand I have to keep this Flink release (1.9.2) ?
>>>
>>
>> I don't see why 1.10.0 would not work for you.
>>
>>
>> Best,
>> Gary
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources
>> [3]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups
>>
>> On Tue, Feb 25, 2020 at 10:28 AM David Morin <mo...@gmail.com>
>> wrote:
>>
>>> Hi Xintong,
>>>
>>> At the moment I'm using the 1.9.2 with this command:
>>>    yarn-session.sh -d *-s 1* -jm 4096 -tm 4096 -qu "XXX" -nm
>>> "MyPipeline"
>>> So, after a lot of tests, I've noticed that if I increase the
>>> parallelism of my Custom Sink, each task is embedded into one TS and, the
>>> most important, each one into one TaskManager (Yarn container in fact).
>>> So, if I understand I have to keep this Flink release (1.9.2) ?
>>>
>>> Thanks
>>> David
>>>
>>>
>>>
>>> Le mar. 25 févr. 2020 à 02:02, Xintong Song <to...@gmail.com> a
>>> écrit :
>>>
>>>> Depending on your Flink version, the '-n' option might not take effect.
>>>> It is removed in the latest release, but before that there were a few
>>>> versions where this option is neither removed nor taking effect.
>>>>
>>>> Anyway, as long as you have multiple containers, I don't think there's
>>>> a way to make some of the tasks scheduled to the same JVM. Not that I'm
>>>> aware of.
>>>>
>>>>
>>>> Thank you~
>>>>
>>>> Xintong Song
>>>>
>>>>
>>>>
>>>> On Mon, Feb 24, 2020 at 8:43 PM David Morin <mo...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks Xintong.
>>>>> I've noticed than when I use yarn-session.sh with --slots (-s)
>>>>> parameter but without --container (-n) it creates one task/slot per
>>>>> taskmanager. Before with the both n and -s it was not the case.
>>>>> I prefer to use only small container with only one task to scale my
>>>>> pipeline and of course to prevent from thread-safe issue
>>>>> Do you think I cannot be confident on that behaviour ?
>>>>>
>>>>> Regards,
>>>>> David
>>>>>
>>>>> On 2020/02/22 17:11:25, David Morin <mo...@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> > My app is based on a lib that is not thread safe (yet...).
>>>>> > In waiting of the patch has been pushed, how can I be sure that my
>>>>> Sink that uses this lib is in one JVM ?
>>>>> > Context: I use one Yarn session and send my Flink jobs to this
>>>>> session
>>>>> >
>>>>> > Regards,
>>>>> > David
>>>>> >
>>>>>
>>>>

Re: yarn session: one JVM per task

Posted by David Morin <mo...@gmail.com>.
Hi Gary,

Sorry I was probably not very clear.
Yes that's exactly what I want to hear :)
I use the -s 1 parameter and what I expect to have is one task of my Sink
(one instance in fact) per TM (i.e. per JVM)
That's the current behaviour during my tests but I want to be sure.
Thanks a lot

David

Le mar. 25 févr. 2020 à 11:16, Gary Yao <ga...@apache.org> a écrit :

> Hi David,
>
> Before with the both n and -s it was not the case.
>>
>
> What do you mean by before? At least in 1.8 "-s" could be used to specify
> the
> number of slots per TM.
>
>
> how can I be sure that my Sink that uses this lib is in one JVM ?
>>
>
> Is it enough that no other parallel instance of your sink runs in the same
> JVM? If that is the case, it is enough to start your your YARN session
> with:
>
>     ./bin/yarn-session.sh -s 1 [...]
>
> This will result in exactly one slot per TM. Note that a single slot may
> still
> hold several subtasks of the job (Slot Sharing) but never two parallel
> instances of your sink [2]. You can also control Slot Sharing manually [3].
>
>
> So, if I understand I have to keep this Flink release (1.9.2) ?
>>
>
> I don't see why 1.10.0 would not work for you.
>
>
> Best,
> Gary
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session
> [2]
> https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources
> [3]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups
>
> On Tue, Feb 25, 2020 at 10:28 AM David Morin <mo...@gmail.com>
> wrote:
>
>> Hi Xintong,
>>
>> At the moment I'm using the 1.9.2 with this command:
>>    yarn-session.sh -d *-s 1* -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline"
>> So, after a lot of tests, I've noticed that if I increase the parallelism
>> of my Custom Sink, each task is embedded into one TS and, the most
>> important, each one into one TaskManager (Yarn container in fact).
>> So, if I understand I have to keep this Flink release (1.9.2) ?
>>
>> Thanks
>> David
>>
>>
>>
>> Le mar. 25 févr. 2020 à 02:02, Xintong Song <to...@gmail.com> a
>> écrit :
>>
>>> Depending on your Flink version, the '-n' option might not take effect.
>>> It is removed in the latest release, but before that there were a few
>>> versions where this option is neither removed nor taking effect.
>>>
>>> Anyway, as long as you have multiple containers, I don't think there's a
>>> way to make some of the tasks scheduled to the same JVM. Not that I'm aware
>>> of.
>>>
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>>
>>> On Mon, Feb 24, 2020 at 8:43 PM David Morin <mo...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks Xintong.
>>>> I've noticed than when I use yarn-session.sh with --slots (-s)
>>>> parameter but without --container (-n) it creates one task/slot per
>>>> taskmanager. Before with the both n and -s it was not the case.
>>>> I prefer to use only small container with only one task to scale my
>>>> pipeline and of course to prevent from thread-safe issue
>>>> Do you think I cannot be confident on that behaviour ?
>>>>
>>>> Regards,
>>>> David
>>>>
>>>> On 2020/02/22 17:11:25, David Morin <mo...@gmail.com> wrote:
>>>> > Hi,
>>>> > My app is based on a lib that is not thread safe (yet...).
>>>> > In waiting of the patch has been pushed, how can I be sure that my
>>>> Sink that uses this lib is in one JVM ?
>>>> > Context: I use one Yarn session and send my Flink jobs to this session
>>>> >
>>>> > Regards,
>>>> > David
>>>> >
>>>>
>>>

Re: yarn session: one JVM per task

Posted by Gary Yao <ga...@apache.org>.
Hi David,

Before with the both n and -s it was not the case.
>

What do you mean by before? At least in 1.8 "-s" could be used to specify
the
number of slots per TM.


how can I be sure that my Sink that uses this lib is in one JVM ?
>

Is it enough that no other parallel instance of your sink runs in the same
JVM? If that is the case, it is enough to start your your YARN session with:

    ./bin/yarn-session.sh -s 1 [...]

This will result in exactly one slot per TM. Note that a single slot may
still
hold several subtasks of the job (Slot Sharing) but never two parallel
instances of your sink [2]. You can also control Slot Sharing manually [3].


So, if I understand I have to keep this Flink release (1.9.2) ?
>

I don't see why 1.10.0 would not work for you.


Best,
Gary

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session
[2]
https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources
[3]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups

On Tue, Feb 25, 2020 at 10:28 AM David Morin <mo...@gmail.com>
wrote:

> Hi Xintong,
>
> At the moment I'm using the 1.9.2 with this command:
>    yarn-session.sh -d *-s 1* -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline"
> So, after a lot of tests, I've noticed that if I increase the parallelism
> of my Custom Sink, each task is embedded into one TS and, the most
> important, each one into one TaskManager (Yarn container in fact).
> So, if I understand I have to keep this Flink release (1.9.2) ?
>
> Thanks
> David
>
>
>
> Le mar. 25 févr. 2020 à 02:02, Xintong Song <to...@gmail.com> a
> écrit :
>
>> Depending on your Flink version, the '-n' option might not take effect.
>> It is removed in the latest release, but before that there were a few
>> versions where this option is neither removed nor taking effect.
>>
>> Anyway, as long as you have multiple containers, I don't think there's a
>> way to make some of the tasks scheduled to the same JVM. Not that I'm aware
>> of.
>>
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Mon, Feb 24, 2020 at 8:43 PM David Morin <mo...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Thanks Xintong.
>>> I've noticed than when I use yarn-session.sh with --slots (-s) parameter
>>> but without --container (-n) it creates one task/slot per taskmanager.
>>> Before with the both n and -s it was not the case.
>>> I prefer to use only small container with only one task to scale my
>>> pipeline and of course to prevent from thread-safe issue
>>> Do you think I cannot be confident on that behaviour ?
>>>
>>> Regards,
>>> David
>>>
>>> On 2020/02/22 17:11:25, David Morin <mo...@gmail.com> wrote:
>>> > Hi,
>>> > My app is based on a lib that is not thread safe (yet...).
>>> > In waiting of the patch has been pushed, how can I be sure that my
>>> Sink that uses this lib is in one JVM ?
>>> > Context: I use one Yarn session and send my Flink jobs to this session
>>> >
>>> > Regards,
>>> > David
>>> >
>>>
>>

Re: yarn session: one JVM per task

Posted by David Morin <mo...@gmail.com>.
Hi Xintong,

At the moment I'm using the 1.9.2 with this command:
   yarn-session.sh -d *-s 1* -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline"
So, after a lot of tests, I've noticed that if I increase the parallelism
of my Custom Sink, each task is embedded into one TS and, the most
important, each one into one TaskManager (Yarn container in fact).
So, if I understand I have to keep this Flink release (1.9.2) ?

Thanks
David



Le mar. 25 févr. 2020 à 02:02, Xintong Song <to...@gmail.com> a
écrit :

> Depending on your Flink version, the '-n' option might not take effect. It
> is removed in the latest release, but before that there were a few versions
> where this option is neither removed nor taking effect.
>
> Anyway, as long as you have multiple containers, I don't think there's a
> way to make some of the tasks scheduled to the same JVM. Not that I'm aware
> of.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Feb 24, 2020 at 8:43 PM David Morin <mo...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Thanks Xintong.
>> I've noticed than when I use yarn-session.sh with --slots (-s) parameter
>> but without --container (-n) it creates one task/slot per taskmanager.
>> Before with the both n and -s it was not the case.
>> I prefer to use only small container with only one task to scale my
>> pipeline and of course to prevent from thread-safe issue
>> Do you think I cannot be confident on that behaviour ?
>>
>> Regards,
>> David
>>
>> On 2020/02/22 17:11:25, David Morin <mo...@gmail.com> wrote:
>> > Hi,
>> > My app is based on a lib that is not thread safe (yet...).
>> > In waiting of the patch has been pushed, how can I be sure that my Sink
>> that uses this lib is in one JVM ?
>> > Context: I use one Yarn session and send my Flink jobs to this session
>> >
>> > Regards,
>> > David
>> >
>>
>

Re: yarn session: one JVM per task

Posted by Xintong Song <to...@gmail.com>.
Depending on your Flink version, the '-n' option might not take effect. It
is removed in the latest release, but before that there were a few versions
where this option is neither removed nor taking effect.

Anyway, as long as you have multiple containers, I don't think there's a
way to make some of the tasks scheduled to the same JVM. Not that I'm aware
of.


Thank you~

Xintong Song



On Mon, Feb 24, 2020 at 8:43 PM David Morin <mo...@gmail.com>
wrote:

> Hi,
>
> Thanks Xintong.
> I've noticed than when I use yarn-session.sh with --slots (-s) parameter
> but without --container (-n) it creates one task/slot per taskmanager.
> Before with the both n and -s it was not the case.
> I prefer to use only small container with only one task to scale my
> pipeline and of course to prevent from thread-safe issue
> Do you think I cannot be confident on that behaviour ?
>
> Regards,
> David
>
> On 2020/02/22 17:11:25, David Morin <mo...@gmail.com> wrote:
> > Hi,
> > My app is based on a lib that is not thread safe (yet...).
> > In waiting of the patch has been pushed, how can I be sure that my Sink
> that uses this lib is in one JVM ?
> > Context: I use one Yarn session and send my Flink jobs to this session
> >
> > Regards,
> > David
> >
>

Re: yarn session: one JVM per task

Posted by David Morin <mo...@gmail.com>.
Hi,

Thanks Xintong.
I've noticed than when I use yarn-session.sh with --slots (-s) parameter but without --container (-n) it creates one task/slot per taskmanager. Before with the both n and -s it was not the case.
I prefer to use only small container with only one task to scale my pipeline and of course to prevent from thread-safe issue
Do you think I cannot be confident on that behaviour ?

Regards,
David

On 2020/02/22 17:11:25, David Morin <mo...@gmail.com> wrote: 
> Hi,
> My app is based on a lib that is not thread safe (yet...).
> In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ?
> Context: I use one Yarn session and send my Flink jobs to this session
> 
> Regards,
> David
> 

Re: yarn session: one JVM per task

Posted by Xintong Song <to...@gmail.com>.
Hi David,

In general, I don't think you can control all parallel subtasks of a
certain task run in the same JVM process with the current Flink.

If you job scale is very small, one thing you might try is to have only one
task manager in the Flink session cluster. You need to make sure the task
manager has enough cpu/memory resources and slots for running your job.

Thank you~

Xintong Song



On Sun, Feb 23, 2020 at 1:11 AM David Morin <mo...@gmail.com>
wrote:

> Hi,
> My app is based on a lib that is not thread safe (yet...).
> In waiting of the patch has been pushed, how can I be sure that my Sink
> that uses this lib is in one JVM ?
> Context: I use one Yarn session and send my Flink jobs to this session
>
> Regards,
> David
>