You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Aljoscha Krettek <al...@apache.org> on 2020/09/01 12:24:41 UTC

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Hi,

playing devils advocate here: should we even make the memory weights 
configurable? We could go with weights that should make sense for most 
cases in the first version and only introduce configurable weights when 
(if) users need them.

Regarding where/how things are configured, I think that most things 
should be a ConfigOption first (Thanks cc'in me, Stephan!). This makes 
them configurable via flink-conf.yaml and via command line parameters, 
for example "bin/flink run -D memory.foo=bla ...". We can think about 
offering programmatic API for cases where it makes sense, of course.

Regarding naming one of the configurable weights 
"StateBackend-BatchAlgorithm". I think it's not a good idea to be that 
specific because the option will not age well. For example when we want 
to change which group of memory consumers are configured together or 
when we add something new.

Best,
Aljoscha

On 31.08.20 08:13, Xintong Song wrote:
> Thanks for the feedbacks, @Stephan
> 
> 
>    - There is a push to make as much as possible configurable via the main
>> configuration, and not only in code. Specifically values for operations and
>> tuning.
>>      I think it would be more important to have such memory weights in the
>> config, compared to in the program API. /cc Aljoscha
> 
> 
> I can see the benefit that having memory weights in the main configuration
> makes tuning easier, which makes great sense to me. On the other hand, what
> we lose is the flexibility to have different weights for jobs running in
> the same Flink cluster. It seems to me the problem is that we don't have an
> easy way to overwrite job-specific configurations without touching the
> codes.
> 
> 
> Given the current status, what if we make the memory weights configurable
> through both the main configuration and the programming API? The main
> configuration should take effect iff the weights are not explicitly
> specified through the programming API. In this way, job cluster users can
> easily tune the weight through the main configuration, while session
> cluster users, if they want to have different weights for jobs, can still
> overwrite the weight through execution configs.
> 
> 
>    - My recommendation would be to keep this as simple as possible. This
>> will make a lot of configuration code harder, and make it harder for users
>> to understand Flink's memory model.
>>      Making things as easy for users to understand is very important in my
>> opinion. In that regard, the main proposal in the FLIP seems better than
>> the alternative proposal listed at the end of the FLIP page.
> 
> +1 from my side.
> 
> 
>    - For the simplicity, we could go even further and simply have two memory
>> users at the moment: The operator algorithm/data-structure and the external
>> language process (Python for now).
>>      We never have batch algos and RocksDB mixed, having this as separate
>> options is confusing as it suggests this can be combined arbitrarily. I
>> also think that a slim possibility that we may ever combine this in the
>> future is not enough reason to make it more complex/confusing.
> 
> 
> Good point. +1 for combining batch/rocksdb weights, for they're never mixed
> together. We can even just name it "StateBackend-BatchAlgorithm" for
> explicitly.
> 
> 
> For "external language process", I'm not entirely sure. Future external
> languages are possibly mixed with python processes. To avoid later
> considering how to share external language memory across different
> languages, I would suggest to present the concept as "python memory" rather
> than "external language process memory".
> 
> 
> Thank you~
> 
> Xintong Song
> 
> 
> 
> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org> wrote:
> 
>> Thanks for driving this proposal. A few thoughts on the current design:
>>
>>    - There is a push to make as much as possible configurable via the main
>> configuration, and not only in code. Specifically values for operations and
>> tuning.
>>      I think it would be more important to have such memory weights in the
>> config, compared to in the program API. /cc Aljoscha
>>
>>    - My recommendation would be to keep this as simple as possible. This
>> will make a lot of configuration code harder, and make it harder for users
>> to understand Flink's memory model.
>>      Making things as easy for users to understand is very important in my
>> opinion. In that regard, the main proposal in the FLIP seems better than
>> the alternative proposal listed at the end of the FLIP page.
>>
>>    - For the simplicity, we could go even further and simply have two memory
>> users at the moment: The operator algorithm/data-structure and the external
>> language process (Python for now).
>>      We never have batch algos and RocksDB mixed, having this as separate
>> options is confusing as it suggests this can be combined arbitrarily. I
>> also think that a slim possibility that we may ever combine this in the
>> future is not enough reason to make it more complex/confusing.
>>
>>    - I am also not aware of any plans to combine the network and operator
>> memory. Not that it would be infeasible to do this, but I think this would
>> also be orthogonal to this change, and I am not sure this would be solved
>> with static weights. So trying to get network memory into this proposal
>> seems pre-mature to me.
>>
>> Best,
>> Stephan
>>
>>
>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <to...@gmail.com>
>> wrote:
>>
>>>>
>>>> A quick question, does network memory treated as managed memory now? Or
>>> in
>>>> the future?
>>>>
>>> No, network memory is independent from managed memory ATM. And I'm not
>>> aware of any plan to combine these two.
>>>
>>> Any insights there?
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>>
>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <yk...@gmail.com> wrote:
>>>
>>>> A quick question, does network memory treated as managed memory now? Or
>>> in
>>>> the future?
>>>>
>>>> Best,
>>>> Kurt
>>>>
>>>>
>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <to...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi devs,
>>>>>
>>>>> I'd like to bring the discussion over FLIP-141[1], which proposes how
>>>>> managed memory should be shared by various use cases within a slot.
>>> This
>>>> is
>>>>> an extension to FLIP-53[2], where we assumed that RocksDB state
>> backend
>>>> and
>>>>> batch operators are the only use cases of managed memory for
>> streaming
>>>> and
>>>>> batch jobs respectively, which is no longer true with the
>> introduction
>>> of
>>>>> Python UDFs.
>>>>>
>>>>> Please notice that we have not reached consensus between two
>> different
>>>>> designs. The major part of this FLIP describes one of the candidates,
>>>> while
>>>>> the alternative is discussed in the section "Rejected Alternatives".
>> We
>>>> are
>>>>> hoping to borrow intelligence from the community to help us resolve
>> the
>>>>> disagreement.
>>>>>
>>>>> Any feedback would be appreciated.
>>>>>
>>>>> Thank you~
>>>>>
>>>>> Xintong Song
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
>>>>>
>>>>> [2]
>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>>>>
>>>>
>>>
>>
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Yangze Guo <ka...@gmail.com>.

Thanks for driving this! The newest version LGTM. +1 for this FLIP.

Best,
Yangze Guo

On Thu, Sep 3, 2020 at 2:11 PM Dian Fu <di...@gmail.com> wrote:
>
> Thanks for driving this FLIP, Xintong! +1 to the updated version.
>
> > 在 2020年9月2日，下午6:09，Xintong Song <to...@gmail.com> 写道：
> >
> > Thanks for the input, Yu.
> >
> > I believe the current proposal should work with RocksDB, or any other state
> > backend, using memory at either the slot or the scope. With the proposed
> > approach, all we need is an indicator (e.g., a configuration option)
> > telling us which scope should we calculate the fractions for.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <ca...@gmail.com> wrote:
> >
> >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
> >>
> >> Just one supplement for the RocksDB state backend part:
> >>
> >> It's true that currently we're using managed memory at the slot scope.
> >> However, IMHO, we may support setting weights for different stateful
> >> operators (for advanced usage) in future. For example, users may choose to
> >> set higher weights for join operator over aggregation operator, to give
> >> more memory to those with bigger states. In this case, we may also use
> >> managed memory at the operator scope for state backends. And if I
> >> understand correctly, the current design could cover this case well.
> >>
> >> Best Regards,
> >> Yu
> >>
> >>
> >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <to...@gmail.com> wrote:
> >>
> >>> Thanks all for the feedback and discussion.
> >>>
> >>> I have updated the FLIP, with the following changes.
> >>>
> >>>   - Choose the main proposal over the alternative approach
> >>>   - Combine weights of RocksDB and batch operators
> >>>   - Expose weights through configuration options, rather than via
> >>>   ExecutionConfig.
> >>>   - Add implementation plan.
> >>>
> >>> Please help take another look.
> >>>
> >>> Thank you~
> >>>
> >>> Xintong Song
> >>>
> >>>
> >>>
> >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <to...@gmail.com>
> >> wrote:
> >>>
> >>>> Thanks for the inputs, Aljoscha & Till.
> >>>>
> >>>>
> >>>> # Weight Configuration
> >>>>
> >>>>
> >>>> I think exposing the knobs incrementally is a good idea. However, I'm
> >> not
> >>>> sure about non-configurable as the first step.
> >>>>
> >>>>
> >>>> Currently, users can tune memory for rocksdb
> >>>> ('taskmanager.memory.managed.size') and python
> >>>> ('python.fn-execution.[framework|buffer].memory.size') separately,
> >> which
> >>>> practically means any combination of rocksdb and python memory sizes.
> >> If
> >>> we
> >>>> switch to non-configurable weights, that will be a regression compared
> >> to
> >>>> 1.11.
> >>>>
> >>>>
> >>>> Therefore, I think exposing via configuration options might be a good
> >>>> first step. And we can discuss exposing via ExecutionConfig if later we
> >>> see
> >>>> that requirement.
> >>>>
> >>>>
> >>>> # Naming of Weights
> >>>>
> >>>>
> >>>> I'm neutral for "Flink/Internal memory".
> >>>>
> >>>>
> >>>> I think the reason we can combine weights for batch algorithms and
> >> state
> >>>> backends is that they are never mixed together. My only concern
> >>>> for "Flink/Internal memory", which might not be a problem at the
> >> moment,
> >>> is
> >>>> that what if new memory use cases appear in the future, which can also
> >> be
> >>>> described by "Flink/Internal memory" but is not guaranteed not mixed
> >> with
> >>>> batch algorithms or state backends?
> >>>>
> >>>>
> >>>> Anyway, I think the naming should not block this FLIP, as long as we
> >> have
> >>>> consensus on combining the two weights for rocksdb and batch
> >> algorithms.
> >>> We
> >>>> can keep the naming discussion open until the implementation phase.
> >>>>
> >>>>
> >>>> Thank you~
> >>>>
> >>>> Xintong Song
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <tr...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Thanks for creating this FLIP Xintong.
> >>>>>
> >>>>> I agree with the previous comments that the memory configuration
> >> should
> >>> be
> >>>>> as easy as possible. Every new knob has the potential to confuse users
> >>>>> and/or allows him to shoot himself in the foot. Consequently, I am +1
> >>> for
> >>>>> the first proposal in the FLIP since it is simpler.
> >>>>>
> >>>>> Also +1 for Stephan's proposal to combine batch operator's and
> >>>>> RocksDB's memory usage into one weight.
> >>>>>
> >>>>> Concerning the names for the two weights, I fear that we are facing
> >> one
> >>> of
> >>>>> the two hard things in computer science. To add another idea, we could
> >>>>> name
> >>>>> them "Flink memory"/"Internal memory" and "Python memory".
> >>>>>
> >>>>> For the sake of making the scope of the FLIP as small as possible and
> >> to
> >>>>> develop the feature incrementally, I think that Aljoscha's proposal to
> >>>>> make
> >>>>> it non-configurable for the first step sounds like a good idea. As a
> >>> next
> >>>>> step (and also if we see need), we can make the memory weights
> >>>>> configurable
> >>>>> via the configuration. And last, we could expose it via the
> >>>>> ExecutionConfig
> >>>>> if it is required.
> >>>>>
> >>>>> Cheers,
> >>>>> Till
> >>>>>
> >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <al...@apache.org>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> playing devils advocate here: should we even make the memory weights
> >>>>>> configurable? We could go with weights that should make sense for
> >> most
> >>>>>> cases in the first version and only introduce configurable weights
> >>> when
> >>>>>> (if) users need them.
> >>>>>>
> >>>>>> Regarding where/how things are configured, I think that most things
> >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This
> >> makes
> >>>>>> them configurable via flink-conf.yaml and via command line
> >> parameters,
> >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think
> >> about
> >>>>>> offering programmatic API for cases where it makes sense, of course.
> >>>>>>
> >>>>>> Regarding naming one of the configurable weights
> >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to be
> >> that
> >>>>>> specific because the option will not age well. For example when we
> >>> want
> >>>>>> to change which group of memory consumers are configured together or
> >>>>>> when we add something new.
> >>>>>>
> >>>>>> Best,
> >>>>>> Aljoscha
> >>>>>>
> >>>>>> On 31.08.20 08:13, Xintong Song wrote:
> >>>>>>> Thanks for the feedbacks, @Stephan
> >>>>>>>
> >>>>>>>
> >>>>>>>   - There is a push to make as much as possible configurable via
> >>> the
> >>>>>> main
> >>>>>>>> configuration, and not only in code. Specifically values for
> >>>>> operations
> >>>>>> and
> >>>>>>>> tuning.
> >>>>>>>>     I think it would be more important to have such memory
> >> weights
> >>>>> in
> >>>>>> the
> >>>>>>>> config, compared to in the program API. /cc Aljoscha
> >>>>>>>
> >>>>>>>
> >>>>>>> I can see the benefit that having memory weights in the main
> >>>>>> configuration
> >>>>>>> makes tuning easier, which makes great sense to me. On the other
> >>> hand,
> >>>>>> what
> >>>>>>> we lose is the flexibility to have different weights for jobs
> >>> running
> >>>>> in
> >>>>>>> the same Flink cluster. It seems to me the problem is that we
> >> don't
> >>>>> have
> >>>>>> an
> >>>>>>> easy way to overwrite job-specific configurations without touching
> >>> the
> >>>>>>> codes.
> >>>>>>>
> >>>>>>>
> >>>>>>> Given the current status, what if we make the memory weights
> >>>>> configurable
> >>>>>>> through both the main configuration and the programming API? The
> >>> main
> >>>>>>> configuration should take effect iff the weights are not
> >> explicitly
> >>>>>>> specified through the programming API. In this way, job cluster
> >>> users
> >>>>> can
> >>>>>>> easily tune the weight through the main configuration, while
> >> session
> >>>>>>> cluster users, if they want to have different weights for jobs,
> >> can
> >>>>> still
> >>>>>>> overwrite the weight through execution configs.
> >>>>>>>
> >>>>>>>
> >>>>>>>   - My recommendation would be to keep this as simple as
> >> possible.
> >>>>> This
> >>>>>>>> will make a lot of configuration code harder, and make it harder
> >>> for
> >>>>>> users
> >>>>>>>> to understand Flink's memory model.
> >>>>>>>>     Making things as easy for users to understand is very
> >>> important
> >>>>> in
> >>>>>> my
> >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> >> better
> >>>>> than
> >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> >>>>>>>
> >>>>>>> +1 from my side.
> >>>>>>>
> >>>>>>>
> >>>>>>>   - For the simplicity, we could go even further and simply have
> >>> two
> >>>>>> memory
> >>>>>>>> users at the moment: The operator algorithm/data-structure and
> >> the
> >>>>>> external
> >>>>>>>> language process (Python for now).
> >>>>>>>>     We never have batch algos and RocksDB mixed, having this as
> >>>>>> separate
> >>>>>>>> options is confusing as it suggests this can be combined
> >>>>> arbitrarily. I
> >>>>>>>> also think that a slim possibility that we may ever combine this
> >> in
> >>>>> the
> >>>>>>>> future is not enough reason to make it more complex/confusing.
> >>>>>>>
> >>>>>>>
> >>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're
> >>> never
> >>>>>> mixed
> >>>>>>> together. We can even just name it "StateBackend-BatchAlgorithm"
> >> for
> >>>>>>> explicitly.
> >>>>>>>
> >>>>>>>
> >>>>>>> For "external language process", I'm not entirely sure. Future
> >>>>> external
> >>>>>>> languages are possibly mixed with python processes. To avoid later
> >>>>>>> considering how to share external language memory across different
> >>>>>>> languages, I would suggest to present the concept as "python
> >> memory"
> >>>>>> rather
> >>>>>>> than "external language process memory".
> >>>>>>>
> >>>>>>>
> >>>>>>> Thank you~
> >>>>>>>
> >>>>>>> Xintong Song
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for driving this proposal. A few thoughts on the current
> >>>>> design:
> >>>>>>>>
> >>>>>>>>   - There is a push to make as much as possible configurable via
> >>> the
> >>>>>> main
> >>>>>>>> configuration, and not only in code. Specifically values for
> >>>>> operations
> >>>>>> and
> >>>>>>>> tuning.
> >>>>>>>>     I think it would be more important to have such memory
> >> weights
> >>>>> in
> >>>>>> the
> >>>>>>>> config, compared to in the program API. /cc Aljoscha
> >>>>>>>>
> >>>>>>>>   - My recommendation would be to keep this as simple as
> >> possible.
> >>>>> This
> >>>>>>>> will make a lot of configuration code harder, and make it harder
> >>> for
> >>>>>> users
> >>>>>>>> to understand Flink's memory model.
> >>>>>>>>     Making things as easy for users to understand is very
> >>> important
> >>>>> in
> >>>>>> my
> >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> >> better
> >>>>> than
> >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> >>>>>>>>
> >>>>>>>>   - For the simplicity, we could go even further and simply have
> >>> two
> >>>>>> memory
> >>>>>>>> users at the moment: The operator algorithm/data-structure and
> >> the
> >>>>>> external
> >>>>>>>> language process (Python for now).
> >>>>>>>>     We never have batch algos and RocksDB mixed, having this as
> >>>>>> separate
> >>>>>>>> options is confusing as it suggests this can be combined
> >>>>> arbitrarily. I
> >>>>>>>> also think that a slim possibility that we may ever combine this
> >> in
> >>>>> the
> >>>>>>>> future is not enough reason to make it more complex/confusing.
> >>>>>>>>
> >>>>>>>>   - I am also not aware of any plans to combine the network and
> >>>>>> operator
> >>>>>>>> memory. Not that it would be infeasible to do this, but I think
> >>> this
> >>>>>> would
> >>>>>>>> also be orthogonal to this change, and I am not sure this would
> >> be
> >>>>>> solved
> >>>>>>>> with static weights. So trying to get network memory into this
> >>>>> proposal
> >>>>>>>> seems pre-mature to me.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Stephan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> >>> tonysong820@gmail.com
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> A quick question, does network memory treated as managed memory
> >>>>> now?
> >>>>>> Or
> >>>>>>>>> in
> >>>>>>>>>> the future?
> >>>>>>>>>>
> >>>>>>>>> No, network memory is independent from managed memory ATM. And
> >> I'm
> >>>>> not
> >>>>>>>>> aware of any plan to combine these two.
> >>>>>>>>>
> >>>>>>>>> Any insights there?
> >>>>>>>>>
> >>>>>>>>> Thank you~
> >>>>>>>>>
> >>>>>>>>> Xintong Song
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <yk...@gmail.com>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> A quick question, does network memory treated as managed memory
> >>>>> now?
> >>>>>> Or
> >>>>>>>>> in
> >>>>>>>>>> the future?
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Kurt
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> >>>>> tonysong820@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi devs,
> >>>>>>>>>>>
> >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which
> >>> proposes
> >>>>> how
> >>>>>>>>>>> managed memory should be shared by various use cases within a
> >>>>> slot.
> >>>>>>>>> This
> >>>>>>>>>> is
> >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB
> >> state
> >>>>>>>> backend
> >>>>>>>>>> and
> >>>>>>>>>>> batch operators are the only use cases of managed memory for
> >>>>>>>> streaming
> >>>>>>>>>> and
> >>>>>>>>>>> batch jobs respectively, which is no longer true with the
> >>>>>>>> introduction
> >>>>>>>>> of
> >>>>>>>>>>> Python UDFs.
> >>>>>>>>>>>
> >>>>>>>>>>> Please notice that we have not reached consensus between two
> >>>>>>>> different
> >>>>>>>>>>> designs. The major part of this FLIP describes one of the
> >>>>> candidates,
> >>>>>>>>>> while
> >>>>>>>>>>> the alternative is discussed in the section "Rejected
> >>>>> Alternatives".
> >>>>>>>> We
> >>>>>>>>>> are
> >>>>>>>>>>> hoping to borrow intelligence from the community to help us
> >>>>> resolve
> >>>>>>>> the
> >>>>>>>>>>> disagreement.
> >>>>>>>>>>>
> >>>>>>>>>>> Any feedback would be appreciated.
> >>>>>>>>>>>
> >>>>>>>>>>> Thank you~
> >>>>>>>>>>>
> >>>>>>>>>>> Xintong Song
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> [1]
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> >>>>>>>>>>>
> >>>>>>>>>>> [2]
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Xintong Song <to...@gmail.com>.

Thanks for the suggestion, @Stephan.

DATAPROC makes good sense to me. +1 here

Regarding the Scope, it is meant for calculating fractions from the
weights. The idea is that the algorithm looks into the scopes and
calculates fractions without understanding the individual use cases.

I guess I should not have put the Scope in the code block in Declare Use
Cases. This is more an internal implementation detail rather than public
interfaces. Sorry for the confusion. I copied the declaration
of MemoryUseCase from some local trying-out codes.

Thank you~

Xintong Song



On Wed, Sep 9, 2020 at 6:15 PM Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> I read through the FLIP and looks good to me. One suggestion and one
> question:
>
> Regarding naming, we could call the ROCKSDB/BATCH_OP category DATAPROC
> because this is the memory that goes into holding (and structuring) the
> data.
>
> I am a bit confused about the Scope enum (with values Slot and Op). Do we
> need to store this in the configuration or can we drop this?
> From my understanding, this is transparent already:
>   - When anyone goes to the MemoryManager, they ask for a fraction of the
> Slot's budget.
>   - RocksDB (which is per slot) goes directly to the MemoryManager
>   - Python process (per slot) goes directly to the MemoryManager
>   - Batch algorithms apply their local operator weight before going to the
> MemoryManager, so by the time the allocate memory, it is already the right
> fraction per-slot.
>
> Best,
> Stephan
>
>
> On Fri, Sep 4, 2020 at 3:46 AM Xintong Song <to...@gmail.com> wrote:
>
>> Thanks Till, `taskmanager.memory.managed.consumer-weights` sounds good to
>> me.
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Thu, Sep 3, 2020 at 8:44 PM Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>> > Thanks for updating the FLIP Xintong. It looks good to me. One minor
>> > comment is that we could name the configuration parameter
>> > also taskmanager.memory.managed.consumer-weights which might be a bit
>> more
>> > expressive what this option does.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Thu, Sep 3, 2020 at 12:44 PM Xintong Song <to...@gmail.com>
>> > wrote:
>> >
>> > > Thanks all for the feedback.
>> > >
>> > > FYI, I've opened a voting thread[1] on this.
>> > >
>> > > Thank you~
>> > >
>> > > Xintong Song
>> > >
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-141-Intra-Slot-Managed-Memory-Sharing-td44358.html
>> > >
>> > >
>> > > On Thu, Sep 3, 2020 at 2:54 PM Zhu Zhu <re...@gmail.com> wrote:
>> > >
>> > > > Thanks for proposing this improvement! @Xintong
>> > > > The proposal looks good to me. Agreed that we should make it as
>> simple
>> > as
>> > > > possible for users to understand.
>> > > >
>> > > > Thanks,
>> > > > Zhu
>> > > >
>> > > > Dian Fu <di...@gmail.com> 于2020年9月3日周四 下午2:11写道：
>> > > >
>> > > > > Thanks for driving this FLIP, Xintong! +1 to the updated version.
>> > > > >
>> > > > > > 在 2020年9月2日，下午6:09，Xintong Song <to...@gmail.com> 写道：
>> > > > > >
>> > > > > > Thanks for the input, Yu.
>> > > > > >
>> > > > > > I believe the current proposal should work with RocksDB, or any
>> > other
>> > > > > state
>> > > > > > backend, using memory at either the slot or the scope. With the
>> > > > proposed
>> > > > > > approach, all we need is an indicator (e.g., a configuration
>> > option)
>> > > > > > telling us which scope should we calculate the fractions for.
>> > > > > >
>> > > > > > Thank you~
>> > > > > >
>> > > > > > Xintong Song
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <ca...@gmail.com> wrote:
>> > > > > >
>> > > > > >> Thanks for compiling the FLIP Xintong, and +1 for the updated
>> doc.
>> > > > > >>
>> > > > > >> Just one supplement for the RocksDB state backend part:
>> > > > > >>
>> > > > > >> It's true that currently we're using managed memory at the slot
>> > > scope.
>> > > > > >> However, IMHO, we may support setting weights for different
>> > stateful
>> > > > > >> operators (for advanced usage) in future. For example, users
>> may
>> > > > choose
>> > > > > to
>> > > > > >> set higher weights for join operator over aggregation
>> operator, to
>> > > > give
>> > > > > >> more memory to those with bigger states. In this case, we may
>> also
>> > > use
>> > > > > >> managed memory at the operator scope for state backends. And
>> if I
>> > > > > >> understand correctly, the current design could cover this case
>> > well.
>> > > > > >>
>> > > > > >> Best Regards,
>> > > > > >> Yu
>> > > > > >>
>> > > > > >>
>> > > > > >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <
>> tonysong820@gmail.com>
>> > > > > wrote:
>> > > > > >>
>> > > > > >>> Thanks all for the feedback and discussion.
>> > > > > >>>
>> > > > > >>> I have updated the FLIP, with the following changes.
>> > > > > >>>
>> > > > > >>>   - Choose the main proposal over the alternative approach
>> > > > > >>>   - Combine weights of RocksDB and batch operators
>> > > > > >>>   - Expose weights through configuration options, rather than
>> via
>> > > > > >>>   ExecutionConfig.
>> > > > > >>>   - Add implementation plan.
>> > > > > >>>
>> > > > > >>> Please help take another look.
>> > > > > >>>
>> > > > > >>> Thank you~
>> > > > > >>>
>> > > > > >>> Xintong Song
>> > > > > >>>
>> > > > > >>>
>> > > > > >>>
>> > > > > >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <
>> > tonysong820@gmail.com
>> > > >
>> > > > > >> wrote:
>> > > > > >>>
>> > > > > >>>> Thanks for the inputs, Aljoscha & Till.
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> # Weight Configuration
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> I think exposing the knobs incrementally is a good idea.
>> > However,
>> > > > I'm
>> > > > > >> not
>> > > > > >>>> sure about non-configurable as the first step.
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> Currently, users can tune memory for rocksdb
>> > > > > >>>> ('taskmanager.memory.managed.size') and python
>> > > > > >>>> ('python.fn-execution.[framework|buffer].memory.size')
>> > separately,
>> > > > > >> which
>> > > > > >>>> practically means any combination of rocksdb and python
>> memory
>> > > > sizes.
>> > > > > >> If
>> > > > > >>> we
>> > > > > >>>> switch to non-configurable weights, that will be a regression
>> > > > compared
>> > > > > >> to
>> > > > > >>>> 1.11.
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> Therefore, I think exposing via configuration options might
>> be a
>> > > > good
>> > > > > >>>> first step. And we can discuss exposing via ExecutionConfig
>> if
>> > > later
>> > > > > we
>> > > > > >>> see
>> > > > > >>>> that requirement.
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> # Naming of Weights
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> I'm neutral for "Flink/Internal memory".
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> I think the reason we can combine weights for batch
>> algorithms
>> > and
>> > > > > >> state
>> > > > > >>>> backends is that they are never mixed together. My only
>> concern
>> > > > > >>>> for "Flink/Internal memory", which might not be a problem at
>> the
>> > > > > >> moment,
>> > > > > >>> is
>> > > > > >>>> that what if new memory use cases appear in the future, which
>> > can
>> > > > also
>> > > > > >> be
>> > > > > >>>> described by "Flink/Internal memory" but is not guaranteed
>> not
>> > > mixed
>> > > > > >> with
>> > > > > >>>> batch algorithms or state backends?
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> Anyway, I think the naming should not block this FLIP, as
>> long
>> > as
>> > > we
>> > > > > >> have
>> > > > > >>>> consensus on combining the two weights for rocksdb and batch
>> > > > > >> algorithms.
>> > > > > >>> We
>> > > > > >>>> can keep the naming discussion open until the implementation
>> > > phase.
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> Thank you~
>> > > > > >>>>
>> > > > > >>>> Xintong Song
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <
>> > > trohrmann@apache.org
>> > > > >
>> > > > > >>>> wrote:
>> > > > > >>>>
>> > > > > >>>>> Thanks for creating this FLIP Xintong.
>> > > > > >>>>>
>> > > > > >>>>> I agree with the previous comments that the memory
>> > configuration
>> > > > > >> should
>> > > > > >>> be
>> > > > > >>>>> as easy as possible. Every new knob has the potential to
>> > confuse
>> > > > > users
>> > > > > >>>>> and/or allows him to shoot himself in the foot.
>> Consequently, I
>> > > am
>> > > > +1
>> > > > > >>> for
>> > > > > >>>>> the first proposal in the FLIP since it is simpler.
>> > > > > >>>>>
>> > > > > >>>>> Also +1 for Stephan's proposal to combine batch operator's
>> and
>> > > > > >>>>> RocksDB's memory usage into one weight.
>> > > > > >>>>>
>> > > > > >>>>> Concerning the names for the two weights, I fear that we are
>> > > facing
>> > > > > >> one
>> > > > > >>> of
>> > > > > >>>>> the two hard things in computer science. To add another
>> idea,
>> > we
>> > > > > could
>> > > > > >>>>> name
>> > > > > >>>>> them "Flink memory"/"Internal memory" and "Python memory".
>> > > > > >>>>>
>> > > > > >>>>> For the sake of making the scope of the FLIP as small as
>> > possible
>> > > > and
>> > > > > >> to
>> > > > > >>>>> develop the feature incrementally, I think that Aljoscha's
>> > > proposal
>> > > > > to
>> > > > > >>>>> make
>> > > > > >>>>> it non-configurable for the first step sounds like a good
>> idea.
>> > > As
>> > > > a
>> > > > > >>> next
>> > > > > >>>>> step (and also if we see need), we can make the memory
>> weights
>> > > > > >>>>> configurable
>> > > > > >>>>> via the configuration. And last, we could expose it via the
>> > > > > >>>>> ExecutionConfig
>> > > > > >>>>> if it is required.
>> > > > > >>>>>
>> > > > > >>>>> Cheers,
>> > > > > >>>>> Till
>> > > > > >>>>>
>> > > > > >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <
>> > > > aljoscha@apache.org
>> > > > > >
>> > > > > >>>>> wrote:
>> > > > > >>>>>
>> > > > > >>>>>> Hi,
>> > > > > >>>>>>
>> > > > > >>>>>> playing devils advocate here: should we even make the
>> memory
>> > > > weights
>> > > > > >>>>>> configurable? We could go with weights that should make
>> sense
>> > > for
>> > > > > >> most
>> > > > > >>>>>> cases in the first version and only introduce configurable
>> > > weights
>> > > > > >>> when
>> > > > > >>>>>> (if) users need them.
>> > > > > >>>>>>
>> > > > > >>>>>> Regarding where/how things are configured, I think that
>> most
>> > > > things
>> > > > > >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!).
>> > This
>> > > > > >> makes
>> > > > > >>>>>> them configurable via flink-conf.yaml and via command line
>> > > > > >> parameters,
>> > > > > >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can
>> > think
>> > > > > >> about
>> > > > > >>>>>> offering programmatic API for cases where it makes sense,
>> of
>> > > > course.
>> > > > > >>>>>>
>> > > > > >>>>>> Regarding naming one of the configurable weights
>> > > > > >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good
>> idea to
>> > > be
>> > > > > >> that
>> > > > > >>>>>> specific because the option will not age well. For example
>> > when
>> > > we
>> > > > > >>> want
>> > > > > >>>>>> to change which group of memory consumers are configured
>> > > together
>> > > > or
>> > > > > >>>>>> when we add something new.
>> > > > > >>>>>>
>> > > > > >>>>>> Best,
>> > > > > >>>>>> Aljoscha
>> > > > > >>>>>>
>> > > > > >>>>>> On 31.08.20 08:13, Xintong Song wrote:
>> > > > > >>>>>>> Thanks for the feedbacks, @Stephan
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>>   - There is a push to make as much as possible
>> configurable
>> > > via
>> > > > > >>> the
>> > > > > >>>>>> main
>> > > > > >>>>>>>> configuration, and not only in code. Specifically values
>> for
>> > > > > >>>>> operations
>> > > > > >>>>>> and
>> > > > > >>>>>>>> tuning.
>> > > > > >>>>>>>>     I think it would be more important to have such
>> memory
>> > > > > >> weights
>> > > > > >>>>> in
>> > > > > >>>>>> the
>> > > > > >>>>>>>> config, compared to in the program API. /cc Aljoscha
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> I can see the benefit that having memory weights in the
>> main
>> > > > > >>>>>> configuration
>> > > > > >>>>>>> makes tuning easier, which makes great sense to me. On the
>> > > other
>> > > > > >>> hand,
>> > > > > >>>>>> what
>> > > > > >>>>>>> we lose is the flexibility to have different weights for
>> jobs
>> > > > > >>> running
>> > > > > >>>>> in
>> > > > > >>>>>>> the same Flink cluster. It seems to me the problem is
>> that we
>> > > > > >> don't
>> > > > > >>>>> have
>> > > > > >>>>>> an
>> > > > > >>>>>>> easy way to overwrite job-specific configurations without
>> > > > touching
>> > > > > >>> the
>> > > > > >>>>>>> codes.
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> Given the current status, what if we make the memory
>> weights
>> > > > > >>>>> configurable
>> > > > > >>>>>>> through both the main configuration and the programming
>> API?
>> > > The
>> > > > > >>> main
>> > > > > >>>>>>> configuration should take effect iff the weights are not
>> > > > > >> explicitly
>> > > > > >>>>>>> specified through the programming API. In this way, job
>> > cluster
>> > > > > >>> users
>> > > > > >>>>> can
>> > > > > >>>>>>> easily tune the weight through the main configuration,
>> while
>> > > > > >> session
>> > > > > >>>>>>> cluster users, if they want to have different weights for
>> > jobs,
>> > > > > >> can
>> > > > > >>>>> still
>> > > > > >>>>>>> overwrite the weight through execution configs.
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>>   - My recommendation would be to keep this as simple as
>> > > > > >> possible.
>> > > > > >>>>> This
>> > > > > >>>>>>>> will make a lot of configuration code harder, and make it
>> > > harder
>> > > > > >>> for
>> > > > > >>>>>> users
>> > > > > >>>>>>>> to understand Flink's memory model.
>> > > > > >>>>>>>>     Making things as easy for users to understand is very
>> > > > > >>> important
>> > > > > >>>>> in
>> > > > > >>>>>> my
>> > > > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP
>> seems
>> > > > > >> better
>> > > > > >>>>> than
>> > > > > >>>>>>>> the alternative proposal listed at the end of the FLIP
>> page.
>> > > > > >>>>>>>
>> > > > > >>>>>>> +1 from my side.
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>>   - For the simplicity, we could go even further and
>> simply
>> > > have
>> > > > > >>> two
>> > > > > >>>>>> memory
>> > > > > >>>>>>>> users at the moment: The operator
>> algorithm/data-structure
>> > and
>> > > > > >> the
>> > > > > >>>>>> external
>> > > > > >>>>>>>> language process (Python for now).
>> > > > > >>>>>>>>     We never have batch algos and RocksDB mixed, having
>> this
>> > > as
>> > > > > >>>>>> separate
>> > > > > >>>>>>>> options is confusing as it suggests this can be combined
>> > > > > >>>>> arbitrarily. I
>> > > > > >>>>>>>> also think that a slim possibility that we may ever
>> combine
>> > > this
>> > > > > >> in
>> > > > > >>>>> the
>> > > > > >>>>>>>> future is not enough reason to make it more
>> > complex/confusing.
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> Good point. +1 for combining batch/rocksdb weights, for
>> > they're
>> > > > > >>> never
>> > > > > >>>>>> mixed
>> > > > > >>>>>>> together. We can even just name it
>> > > "StateBackend-BatchAlgorithm"
>> > > > > >> for
>> > > > > >>>>>>> explicitly.
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> For "external language process", I'm not entirely sure.
>> > Future
>> > > > > >>>>> external
>> > > > > >>>>>>> languages are possibly mixed with python processes. To
>> avoid
>> > > > later
>> > > > > >>>>>>> considering how to share external language memory across
>> > > > different
>> > > > > >>>>>>> languages, I would suggest to present the concept as
>> "python
>> > > > > >> memory"
>> > > > > >>>>>> rather
>> > > > > >>>>>>> than "external language process memory".
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> Thank you~
>> > > > > >>>>>>>
>> > > > > >>>>>>> Xintong Song
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <
>> > > sewen@apache.org>
>> > > > > >>>>> wrote:
>> > > > > >>>>>>>
>> > > > > >>>>>>>> Thanks for driving this proposal. A few thoughts on the
>> > > current
>> > > > > >>>>> design:
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>   - There is a push to make as much as possible
>> configurable
>> > > via
>> > > > > >>> the
>> > > > > >>>>>> main
>> > > > > >>>>>>>> configuration, and not only in code. Specifically values
>> for
>> > > > > >>>>> operations
>> > > > > >>>>>> and
>> > > > > >>>>>>>> tuning.
>> > > > > >>>>>>>>     I think it would be more important to have such
>> memory
>> > > > > >> weights
>> > > > > >>>>> in
>> > > > > >>>>>> the
>> > > > > >>>>>>>> config, compared to in the program API. /cc Aljoscha
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>   - My recommendation would be to keep this as simple as
>> > > > > >> possible.
>> > > > > >>>>> This
>> > > > > >>>>>>>> will make a lot of configuration code harder, and make it
>> > > harder
>> > > > > >>> for
>> > > > > >>>>>> users
>> > > > > >>>>>>>> to understand Flink's memory model.
>> > > > > >>>>>>>>     Making things as easy for users to understand is very
>> > > > > >>> important
>> > > > > >>>>> in
>> > > > > >>>>>> my
>> > > > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP
>> seems
>> > > > > >> better
>> > > > > >>>>> than
>> > > > > >>>>>>>> the alternative proposal listed at the end of the FLIP
>> page.
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>   - For the simplicity, we could go even further and
>> simply
>> > > have
>> > > > > >>> two
>> > > > > >>>>>> memory
>> > > > > >>>>>>>> users at the moment: The operator
>> algorithm/data-structure
>> > and
>> > > > > >> the
>> > > > > >>>>>> external
>> > > > > >>>>>>>> language process (Python for now).
>> > > > > >>>>>>>>     We never have batch algos and RocksDB mixed, having
>> this
>> > > as
>> > > > > >>>>>> separate
>> > > > > >>>>>>>> options is confusing as it suggests this can be combined
>> > > > > >>>>> arbitrarily. I
>> > > > > >>>>>>>> also think that a slim possibility that we may ever
>> combine
>> > > this
>> > > > > >> in
>> > > > > >>>>> the
>> > > > > >>>>>>>> future is not enough reason to make it more
>> > complex/confusing.
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>   - I am also not aware of any plans to combine the
>> network
>> > > and
>> > > > > >>>>>> operator
>> > > > > >>>>>>>> memory. Not that it would be infeasible to do this, but I
>> > > think
>> > > > > >>> this
>> > > > > >>>>>> would
>> > > > > >>>>>>>> also be orthogonal to this change, and I am not sure this
>> > > would
>> > > > > >> be
>> > > > > >>>>>> solved
>> > > > > >>>>>>>> with static weights. So trying to get network memory into
>> > this
>> > > > > >>>>> proposal
>> > > > > >>>>>>>> seems pre-mature to me.
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> Best,
>> > > > > >>>>>>>> Stephan
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
>> > > > > >>> tonysong820@gmail.com
>> > > > > >>>>>>
>> > > > > >>>>>>>> wrote:
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> A quick question, does network memory treated as
>> managed
>> > > > memory
>> > > > > >>>>> now?
>> > > > > >>>>>> Or
>> > > > > >>>>>>>>> in
>> > > > > >>>>>>>>>> the future?
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>> No, network memory is independent from managed memory
>> ATM.
>> > > And
>> > > > > >> I'm
>> > > > > >>>>> not
>> > > > > >>>>>>>>> aware of any plan to combine these two.
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> Any insights there?
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> Thank you~
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> Xintong Song
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <
>> > ykt836@gmail.com
>> > > >
>> > > > > >>>>> wrote:
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>> A quick question, does network memory treated as
>> managed
>> > > > memory
>> > > > > >>>>> now?
>> > > > > >>>>>> Or
>> > > > > >>>>>>>>> in
>> > > > > >>>>>>>>>> the future?
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> Best,
>> > > > > >>>>>>>>>> Kurt
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
>> > > > > >>>>> tonysong820@gmail.com>
>> > > > > >>>>>>>>>> wrote:
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>>> Hi devs,
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1],
>> which
>> > > > > >>> proposes
>> > > > > >>>>> how
>> > > > > >>>>>>>>>>> managed memory should be shared by various use cases
>> > > within a
>> > > > > >>>>> slot.
>> > > > > >>>>>>>>> This
>> > > > > >>>>>>>>>> is
>> > > > > >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that
>> RocksDB
>> > > > > >> state
>> > > > > >>>>>>>> backend
>> > > > > >>>>>>>>>> and
>> > > > > >>>>>>>>>>> batch operators are the only use cases of managed
>> memory
>> > > for
>> > > > > >>>>>>>> streaming
>> > > > > >>>>>>>>>> and
>> > > > > >>>>>>>>>>> batch jobs respectively, which is no longer true with
>> the
>> > > > > >>>>>>>> introduction
>> > > > > >>>>>>>>> of
>> > > > > >>>>>>>>>>> Python UDFs.
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>> Please notice that we have not reached consensus
>> between
>> > > two
>> > > > > >>>>>>>> different
>> > > > > >>>>>>>>>>> designs. The major part of this FLIP describes one of
>> the
>> > > > > >>>>> candidates,
>> > > > > >>>>>>>>>> while
>> > > > > >>>>>>>>>>> the alternative is discussed in the section "Rejected
>> > > > > >>>>> Alternatives".
>> > > > > >>>>>>>> We
>> > > > > >>>>>>>>>> are
>> > > > > >>>>>>>>>>> hoping to borrow intelligence from the community to
>> help
>> > us
>> > > > > >>>>> resolve
>> > > > > >>>>>>>> the
>> > > > > >>>>>>>>>>> disagreement.
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>> Any feedback would be appreciated.
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>> Thank you~
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>> Xintong Song
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>> [1]
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>
>> > > > > >>>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>> [2]
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>
>> > > > > >>>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>
>> > > > > >>>>
>> > > > > >>>
>> > > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Stephan Ewen <se...@apache.org>.

Hi!

I read through the FLIP and looks good to me. One suggestion and one
question:

Regarding naming, we could call the ROCKSDB/BATCH_OP category DATAPROC
because this is the memory that goes into holding (and structuring) the
data.

I am a bit confused about the Scope enum (with values Slot and Op). Do we
need to store this in the configuration or can we drop this?
From my understanding, this is transparent already:
  - When anyone goes to the MemoryManager, they ask for a fraction of the
Slot's budget.
  - RocksDB (which is per slot) goes directly to the MemoryManager
  - Python process (per slot) goes directly to the MemoryManager
  - Batch algorithms apply their local operator weight before going to the
MemoryManager, so by the time the allocate memory, it is already the right
fraction per-slot.

Best,
Stephan


On Fri, Sep 4, 2020 at 3:46 AM Xintong Song <to...@gmail.com> wrote:

> Thanks Till, `taskmanager.memory.managed.consumer-weights` sounds good to
> me.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, Sep 3, 2020 at 8:44 PM Till Rohrmann <tr...@apache.org> wrote:
>
> > Thanks for updating the FLIP Xintong. It looks good to me. One minor
> > comment is that we could name the configuration parameter
> > also taskmanager.memory.managed.consumer-weights which might be a bit
> more
> > expressive what this option does.
> >
> > Cheers,
> > Till
> >
> > On Thu, Sep 3, 2020 at 12:44 PM Xintong Song <to...@gmail.com>
> > wrote:
> >
> > > Thanks all for the feedback.
> > >
> > > FYI, I've opened a voting thread[1] on this.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > > [1]
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-141-Intra-Slot-Managed-Memory-Sharing-td44358.html
> > >
> > >
> > > On Thu, Sep 3, 2020 at 2:54 PM Zhu Zhu <re...@gmail.com> wrote:
> > >
> > > > Thanks for proposing this improvement! @Xintong
> > > > The proposal looks good to me. Agreed that we should make it as
> simple
> > as
> > > > possible for users to understand.
> > > >
> > > > Thanks,
> > > > Zhu
> > > >
> > > > Dian Fu <di...@gmail.com> 于2020年9月3日周四 下午2:11写道：
> > > >
> > > > > Thanks for driving this FLIP, Xintong! +1 to the updated version.
> > > > >
> > > > > > 在 2020年9月2日，下午6:09，Xintong Song <to...@gmail.com> 写道：
> > > > > >
> > > > > > Thanks for the input, Yu.
> > > > > >
> > > > > > I believe the current proposal should work with RocksDB, or any
> > other
> > > > > state
> > > > > > backend, using memory at either the slot or the scope. With the
> > > > proposed
> > > > > > approach, all we need is an indicator (e.g., a configuration
> > option)
> > > > > > telling us which scope should we calculate the fractions for.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <ca...@gmail.com> wrote:
> > > > > >
> > > > > >> Thanks for compiling the FLIP Xintong, and +1 for the updated
> doc.
> > > > > >>
> > > > > >> Just one supplement for the RocksDB state backend part:
> > > > > >>
> > > > > >> It's true that currently we're using managed memory at the slot
> > > scope.
> > > > > >> However, IMHO, we may support setting weights for different
> > stateful
> > > > > >> operators (for advanced usage) in future. For example, users may
> > > > choose
> > > > > to
> > > > > >> set higher weights for join operator over aggregation operator,
> to
> > > > give
> > > > > >> more memory to those with bigger states. In this case, we may
> also
> > > use
> > > > > >> managed memory at the operator scope for state backends. And if
> I
> > > > > >> understand correctly, the current design could cover this case
> > well.
> > > > > >>
> > > > > >> Best Regards,
> > > > > >> Yu
> > > > > >>
> > > > > >>
> > > > > >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <
> tonysong820@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >>> Thanks all for the feedback and discussion.
> > > > > >>>
> > > > > >>> I have updated the FLIP, with the following changes.
> > > > > >>>
> > > > > >>>   - Choose the main proposal over the alternative approach
> > > > > >>>   - Combine weights of RocksDB and batch operators
> > > > > >>>   - Expose weights through configuration options, rather than
> via
> > > > > >>>   ExecutionConfig.
> > > > > >>>   - Add implementation plan.
> > > > > >>>
> > > > > >>> Please help take another look.
> > > > > >>>
> > > > > >>> Thank you~
> > > > > >>>
> > > > > >>> Xintong Song
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <
> > tonysong820@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >>>
> > > > > >>>> Thanks for the inputs, Aljoscha & Till.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> # Weight Configuration
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> I think exposing the knobs incrementally is a good idea.
> > However,
> > > > I'm
> > > > > >> not
> > > > > >>>> sure about non-configurable as the first step.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Currently, users can tune memory for rocksdb
> > > > > >>>> ('taskmanager.memory.managed.size') and python
> > > > > >>>> ('python.fn-execution.[framework|buffer].memory.size')
> > separately,
> > > > > >> which
> > > > > >>>> practically means any combination of rocksdb and python memory
> > > > sizes.
> > > > > >> If
> > > > > >>> we
> > > > > >>>> switch to non-configurable weights, that will be a regression
> > > > compared
> > > > > >> to
> > > > > >>>> 1.11.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Therefore, I think exposing via configuration options might
> be a
> > > > good
> > > > > >>>> first step. And we can discuss exposing via ExecutionConfig if
> > > later
> > > > > we
> > > > > >>> see
> > > > > >>>> that requirement.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> # Naming of Weights
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> I'm neutral for "Flink/Internal memory".
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> I think the reason we can combine weights for batch algorithms
> > and
> > > > > >> state
> > > > > >>>> backends is that they are never mixed together. My only
> concern
> > > > > >>>> for "Flink/Internal memory", which might not be a problem at
> the
> > > > > >> moment,
> > > > > >>> is
> > > > > >>>> that what if new memory use cases appear in the future, which
> > can
> > > > also
> > > > > >> be
> > > > > >>>> described by "Flink/Internal memory" but is not guaranteed not
> > > mixed
> > > > > >> with
> > > > > >>>> batch algorithms or state backends?
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Anyway, I think the naming should not block this FLIP, as long
> > as
> > > we
> > > > > >> have
> > > > > >>>> consensus on combining the two weights for rocksdb and batch
> > > > > >> algorithms.
> > > > > >>> We
> > > > > >>>> can keep the naming discussion open until the implementation
> > > phase.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Thank you~
> > > > > >>>>
> > > > > >>>> Xintong Song
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <
> > > trohrmann@apache.org
> > > > >
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>> Thanks for creating this FLIP Xintong.
> > > > > >>>>>
> > > > > >>>>> I agree with the previous comments that the memory
> > configuration
> > > > > >> should
> > > > > >>> be
> > > > > >>>>> as easy as possible. Every new knob has the potential to
> > confuse
> > > > > users
> > > > > >>>>> and/or allows him to shoot himself in the foot.
> Consequently, I
> > > am
> > > > +1
> > > > > >>> for
> > > > > >>>>> the first proposal in the FLIP since it is simpler.
> > > > > >>>>>
> > > > > >>>>> Also +1 for Stephan's proposal to combine batch operator's
> and
> > > > > >>>>> RocksDB's memory usage into one weight.
> > > > > >>>>>
> > > > > >>>>> Concerning the names for the two weights, I fear that we are
> > > facing
> > > > > >> one
> > > > > >>> of
> > > > > >>>>> the two hard things in computer science. To add another idea,
> > we
> > > > > could
> > > > > >>>>> name
> > > > > >>>>> them "Flink memory"/"Internal memory" and "Python memory".
> > > > > >>>>>
> > > > > >>>>> For the sake of making the scope of the FLIP as small as
> > possible
> > > > and
> > > > > >> to
> > > > > >>>>> develop the feature incrementally, I think that Aljoscha's
> > > proposal
> > > > > to
> > > > > >>>>> make
> > > > > >>>>> it non-configurable for the first step sounds like a good
> idea.
> > > As
> > > > a
> > > > > >>> next
> > > > > >>>>> step (and also if we see need), we can make the memory
> weights
> > > > > >>>>> configurable
> > > > > >>>>> via the configuration. And last, we could expose it via the
> > > > > >>>>> ExecutionConfig
> > > > > >>>>> if it is required.
> > > > > >>>>>
> > > > > >>>>> Cheers,
> > > > > >>>>> Till
> > > > > >>>>>
> > > > > >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <
> > > > aljoscha@apache.org
> > > > > >
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>>> Hi,
> > > > > >>>>>>
> > > > > >>>>>> playing devils advocate here: should we even make the memory
> > > > weights
> > > > > >>>>>> configurable? We could go with weights that should make
> sense
> > > for
> > > > > >> most
> > > > > >>>>>> cases in the first version and only introduce configurable
> > > weights
> > > > > >>> when
> > > > > >>>>>> (if) users need them.
> > > > > >>>>>>
> > > > > >>>>>> Regarding where/how things are configured, I think that most
> > > > things
> > > > > >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!).
> > This
> > > > > >> makes
> > > > > >>>>>> them configurable via flink-conf.yaml and via command line
> > > > > >> parameters,
> > > > > >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can
> > think
> > > > > >> about
> > > > > >>>>>> offering programmatic API for cases where it makes sense, of
> > > > course.
> > > > > >>>>>>
> > > > > >>>>>> Regarding naming one of the configurable weights
> > > > > >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea
> to
> > > be
> > > > > >> that
> > > > > >>>>>> specific because the option will not age well. For example
> > when
> > > we
> > > > > >>> want
> > > > > >>>>>> to change which group of memory consumers are configured
> > > together
> > > > or
> > > > > >>>>>> when we add something new.
> > > > > >>>>>>
> > > > > >>>>>> Best,
> > > > > >>>>>> Aljoscha
> > > > > >>>>>>
> > > > > >>>>>> On 31.08.20 08:13, Xintong Song wrote:
> > > > > >>>>>>> Thanks for the feedbacks, @Stephan
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>   - There is a push to make as much as possible
> configurable
> > > via
> > > > > >>> the
> > > > > >>>>>> main
> > > > > >>>>>>>> configuration, and not only in code. Specifically values
> for
> > > > > >>>>> operations
> > > > > >>>>>> and
> > > > > >>>>>>>> tuning.
> > > > > >>>>>>>>     I think it would be more important to have such memory
> > > > > >> weights
> > > > > >>>>> in
> > > > > >>>>>> the
> > > > > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> I can see the benefit that having memory weights in the
> main
> > > > > >>>>>> configuration
> > > > > >>>>>>> makes tuning easier, which makes great sense to me. On the
> > > other
> > > > > >>> hand,
> > > > > >>>>>> what
> > > > > >>>>>>> we lose is the flexibility to have different weights for
> jobs
> > > > > >>> running
> > > > > >>>>> in
> > > > > >>>>>>> the same Flink cluster. It seems to me the problem is that
> we
> > > > > >> don't
> > > > > >>>>> have
> > > > > >>>>>> an
> > > > > >>>>>>> easy way to overwrite job-specific configurations without
> > > > touching
> > > > > >>> the
> > > > > >>>>>>> codes.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> Given the current status, what if we make the memory
> weights
> > > > > >>>>> configurable
> > > > > >>>>>>> through both the main configuration and the programming
> API?
> > > The
> > > > > >>> main
> > > > > >>>>>>> configuration should take effect iff the weights are not
> > > > > >> explicitly
> > > > > >>>>>>> specified through the programming API. In this way, job
> > cluster
> > > > > >>> users
> > > > > >>>>> can
> > > > > >>>>>>> easily tune the weight through the main configuration,
> while
> > > > > >> session
> > > > > >>>>>>> cluster users, if they want to have different weights for
> > jobs,
> > > > > >> can
> > > > > >>>>> still
> > > > > >>>>>>> overwrite the weight through execution configs.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>   - My recommendation would be to keep this as simple as
> > > > > >> possible.
> > > > > >>>>> This
> > > > > >>>>>>>> will make a lot of configuration code harder, and make it
> > > harder
> > > > > >>> for
> > > > > >>>>>> users
> > > > > >>>>>>>> to understand Flink's memory model.
> > > > > >>>>>>>>     Making things as easy for users to understand is very
> > > > > >>> important
> > > > > >>>>> in
> > > > > >>>>>> my
> > > > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP
> seems
> > > > > >> better
> > > > > >>>>> than
> > > > > >>>>>>>> the alternative proposal listed at the end of the FLIP
> page.
> > > > > >>>>>>>
> > > > > >>>>>>> +1 from my side.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>   - For the simplicity, we could go even further and simply
> > > have
> > > > > >>> two
> > > > > >>>>>> memory
> > > > > >>>>>>>> users at the moment: The operator algorithm/data-structure
> > and
> > > > > >> the
> > > > > >>>>>> external
> > > > > >>>>>>>> language process (Python for now).
> > > > > >>>>>>>>     We never have batch algos and RocksDB mixed, having
> this
> > > as
> > > > > >>>>>> separate
> > > > > >>>>>>>> options is confusing as it suggests this can be combined
> > > > > >>>>> arbitrarily. I
> > > > > >>>>>>>> also think that a slim possibility that we may ever
> combine
> > > this
> > > > > >> in
> > > > > >>>>> the
> > > > > >>>>>>>> future is not enough reason to make it more
> > complex/confusing.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> Good point. +1 for combining batch/rocksdb weights, for
> > they're
> > > > > >>> never
> > > > > >>>>>> mixed
> > > > > >>>>>>> together. We can even just name it
> > > "StateBackend-BatchAlgorithm"
> > > > > >> for
> > > > > >>>>>>> explicitly.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> For "external language process", I'm not entirely sure.
> > Future
> > > > > >>>>> external
> > > > > >>>>>>> languages are possibly mixed with python processes. To
> avoid
> > > > later
> > > > > >>>>>>> considering how to share external language memory across
> > > > different
> > > > > >>>>>>> languages, I would suggest to present the concept as
> "python
> > > > > >> memory"
> > > > > >>>>>> rather
> > > > > >>>>>>> than "external language process memory".
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> Thank you~
> > > > > >>>>>>>
> > > > > >>>>>>> Xintong Song
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <
> > > sewen@apache.org>
> > > > > >>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Thanks for driving this proposal. A few thoughts on the
> > > current
> > > > > >>>>> design:
> > > > > >>>>>>>>
> > > > > >>>>>>>>   - There is a push to make as much as possible
> configurable
> > > via
> > > > > >>> the
> > > > > >>>>>> main
> > > > > >>>>>>>> configuration, and not only in code. Specifically values
> for
> > > > > >>>>> operations
> > > > > >>>>>> and
> > > > > >>>>>>>> tuning.
> > > > > >>>>>>>>     I think it would be more important to have such memory
> > > > > >> weights
> > > > > >>>>> in
> > > > > >>>>>> the
> > > > > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > > > > >>>>>>>>
> > > > > >>>>>>>>   - My recommendation would be to keep this as simple as
> > > > > >> possible.
> > > > > >>>>> This
> > > > > >>>>>>>> will make a lot of configuration code harder, and make it
> > > harder
> > > > > >>> for
> > > > > >>>>>> users
> > > > > >>>>>>>> to understand Flink's memory model.
> > > > > >>>>>>>>     Making things as easy for users to understand is very
> > > > > >>> important
> > > > > >>>>> in
> > > > > >>>>>> my
> > > > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP
> seems
> > > > > >> better
> > > > > >>>>> than
> > > > > >>>>>>>> the alternative proposal listed at the end of the FLIP
> page.
> > > > > >>>>>>>>
> > > > > >>>>>>>>   - For the simplicity, we could go even further and
> simply
> > > have
> > > > > >>> two
> > > > > >>>>>> memory
> > > > > >>>>>>>> users at the moment: The operator algorithm/data-structure
> > and
> > > > > >> the
> > > > > >>>>>> external
> > > > > >>>>>>>> language process (Python for now).
> > > > > >>>>>>>>     We never have batch algos and RocksDB mixed, having
> this
> > > as
> > > > > >>>>>> separate
> > > > > >>>>>>>> options is confusing as it suggests this can be combined
> > > > > >>>>> arbitrarily. I
> > > > > >>>>>>>> also think that a slim possibility that we may ever
> combine
> > > this
> > > > > >> in
> > > > > >>>>> the
> > > > > >>>>>>>> future is not enough reason to make it more
> > complex/confusing.
> > > > > >>>>>>>>
> > > > > >>>>>>>>   - I am also not aware of any plans to combine the
> network
> > > and
> > > > > >>>>>> operator
> > > > > >>>>>>>> memory. Not that it would be infeasible to do this, but I
> > > think
> > > > > >>> this
> > > > > >>>>>> would
> > > > > >>>>>>>> also be orthogonal to this change, and I am not sure this
> > > would
> > > > > >> be
> > > > > >>>>>> solved
> > > > > >>>>>>>> with static weights. So trying to get network memory into
> > this
> > > > > >>>>> proposal
> > > > > >>>>>>>> seems pre-mature to me.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Best,
> > > > > >>>>>>>> Stephan
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> > > > > >>> tonysong820@gmail.com
> > > > > >>>>>>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> A quick question, does network memory treated as managed
> > > > memory
> > > > > >>>>> now?
> > > > > >>>>>> Or
> > > > > >>>>>>>>> in
> > > > > >>>>>>>>>> the future?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>> No, network memory is independent from managed memory
> ATM.
> > > And
> > > > > >> I'm
> > > > > >>>>> not
> > > > > >>>>>>>>> aware of any plan to combine these two.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Any insights there?
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thank you~
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Xintong Song
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <
> > ykt836@gmail.com
> > > >
> > > > > >>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> A quick question, does network memory treated as managed
> > > > memory
> > > > > >>>>> now?
> > > > > >>>>>> Or
> > > > > >>>>>>>>> in
> > > > > >>>>>>>>>> the future?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Best,
> > > > > >>>>>>>>>> Kurt
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> > > > > >>>>> tonysong820@gmail.com>
> > > > > >>>>>>>>>> wrote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Hi devs,
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1],
> which
> > > > > >>> proposes
> > > > > >>>>> how
> > > > > >>>>>>>>>>> managed memory should be shared by various use cases
> > > within a
> > > > > >>>>> slot.
> > > > > >>>>>>>>> This
> > > > > >>>>>>>>>> is
> > > > > >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that
> RocksDB
> > > > > >> state
> > > > > >>>>>>>> backend
> > > > > >>>>>>>>>> and
> > > > > >>>>>>>>>>> batch operators are the only use cases of managed
> memory
> > > for
> > > > > >>>>>>>> streaming
> > > > > >>>>>>>>>> and
> > > > > >>>>>>>>>>> batch jobs respectively, which is no longer true with
> the
> > > > > >>>>>>>> introduction
> > > > > >>>>>>>>> of
> > > > > >>>>>>>>>>> Python UDFs.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Please notice that we have not reached consensus
> between
> > > two
> > > > > >>>>>>>> different
> > > > > >>>>>>>>>>> designs. The major part of this FLIP describes one of
> the
> > > > > >>>>> candidates,
> > > > > >>>>>>>>>> while
> > > > > >>>>>>>>>>> the alternative is discussed in the section "Rejected
> > > > > >>>>> Alternatives".
> > > > > >>>>>>>> We
> > > > > >>>>>>>>>> are
> > > > > >>>>>>>>>>> hoping to borrow intelligence from the community to
> help
> > us
> > > > > >>>>> resolve
> > > > > >>>>>>>> the
> > > > > >>>>>>>>>>> disagreement.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Any feedback would be appreciated.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Thank you~
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Xintong Song
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> [1]
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> [2]
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Xintong Song <to...@gmail.com>.

Thanks Till, `taskmanager.memory.managed.consumer-weights` sounds good to
me.

Thank you~

Xintong Song



On Thu, Sep 3, 2020 at 8:44 PM Till Rohrmann <tr...@apache.org> wrote:

> Thanks for updating the FLIP Xintong. It looks good to me. One minor
> comment is that we could name the configuration parameter
> also taskmanager.memory.managed.consumer-weights which might be a bit more
> expressive what this option does.
>
> Cheers,
> Till
>
> On Thu, Sep 3, 2020 at 12:44 PM Xintong Song <to...@gmail.com>
> wrote:
>
> > Thanks all for the feedback.
> >
> > FYI, I've opened a voting thread[1] on this.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-141-Intra-Slot-Managed-Memory-Sharing-td44358.html
> >
> >
> > On Thu, Sep 3, 2020 at 2:54 PM Zhu Zhu <re...@gmail.com> wrote:
> >
> > > Thanks for proposing this improvement! @Xintong
> > > The proposal looks good to me. Agreed that we should make it as simple
> as
> > > possible for users to understand.
> > >
> > > Thanks,
> > > Zhu
> > >
> > > Dian Fu <di...@gmail.com> 于2020年9月3日周四 下午2:11写道：
> > >
> > > > Thanks for driving this FLIP, Xintong! +1 to the updated version.
> > > >
> > > > > 在 2020年9月2日，下午6:09，Xintong Song <to...@gmail.com> 写道：
> > > > >
> > > > > Thanks for the input, Yu.
> > > > >
> > > > > I believe the current proposal should work with RocksDB, or any
> other
> > > > state
> > > > > backend, using memory at either the slot or the scope. With the
> > > proposed
> > > > > approach, all we need is an indicator (e.g., a configuration
> option)
> > > > > telling us which scope should we calculate the fractions for.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <ca...@gmail.com> wrote:
> > > > >
> > > > >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
> > > > >>
> > > > >> Just one supplement for the RocksDB state backend part:
> > > > >>
> > > > >> It's true that currently we're using managed memory at the slot
> > scope.
> > > > >> However, IMHO, we may support setting weights for different
> stateful
> > > > >> operators (for advanced usage) in future. For example, users may
> > > choose
> > > > to
> > > > >> set higher weights for join operator over aggregation operator, to
> > > give
> > > > >> more memory to those with bigger states. In this case, we may also
> > use
> > > > >> managed memory at the operator scope for state backends. And if I
> > > > >> understand correctly, the current design could cover this case
> well.
> > > > >>
> > > > >> Best Regards,
> > > > >> Yu
> > > > >>
> > > > >>
> > > > >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <to...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >>> Thanks all for the feedback and discussion.
> > > > >>>
> > > > >>> I have updated the FLIP, with the following changes.
> > > > >>>
> > > > >>>   - Choose the main proposal over the alternative approach
> > > > >>>   - Combine weights of RocksDB and batch operators
> > > > >>>   - Expose weights through configuration options, rather than via
> > > > >>>   ExecutionConfig.
> > > > >>>   - Add implementation plan.
> > > > >>>
> > > > >>> Please help take another look.
> > > > >>>
> > > > >>> Thank you~
> > > > >>>
> > > > >>> Xintong Song
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <
> tonysong820@gmail.com
> > >
> > > > >> wrote:
> > > > >>>
> > > > >>>> Thanks for the inputs, Aljoscha & Till.
> > > > >>>>
> > > > >>>>
> > > > >>>> # Weight Configuration
> > > > >>>>
> > > > >>>>
> > > > >>>> I think exposing the knobs incrementally is a good idea.
> However,
> > > I'm
> > > > >> not
> > > > >>>> sure about non-configurable as the first step.
> > > > >>>>
> > > > >>>>
> > > > >>>> Currently, users can tune memory for rocksdb
> > > > >>>> ('taskmanager.memory.managed.size') and python
> > > > >>>> ('python.fn-execution.[framework|buffer].memory.size')
> separately,
> > > > >> which
> > > > >>>> practically means any combination of rocksdb and python memory
> > > sizes.
> > > > >> If
> > > > >>> we
> > > > >>>> switch to non-configurable weights, that will be a regression
> > > compared
> > > > >> to
> > > > >>>> 1.11.
> > > > >>>>
> > > > >>>>
> > > > >>>> Therefore, I think exposing via configuration options might be a
> > > good
> > > > >>>> first step. And we can discuss exposing via ExecutionConfig if
> > later
> > > > we
> > > > >>> see
> > > > >>>> that requirement.
> > > > >>>>
> > > > >>>>
> > > > >>>> # Naming of Weights
> > > > >>>>
> > > > >>>>
> > > > >>>> I'm neutral for "Flink/Internal memory".
> > > > >>>>
> > > > >>>>
> > > > >>>> I think the reason we can combine weights for batch algorithms
> and
> > > > >> state
> > > > >>>> backends is that they are never mixed together. My only concern
> > > > >>>> for "Flink/Internal memory", which might not be a problem at the
> > > > >> moment,
> > > > >>> is
> > > > >>>> that what if new memory use cases appear in the future, which
> can
> > > also
> > > > >> be
> > > > >>>> described by "Flink/Internal memory" but is not guaranteed not
> > mixed
> > > > >> with
> > > > >>>> batch algorithms or state backends?
> > > > >>>>
> > > > >>>>
> > > > >>>> Anyway, I think the naming should not block this FLIP, as long
> as
> > we
> > > > >> have
> > > > >>>> consensus on combining the two weights for rocksdb and batch
> > > > >> algorithms.
> > > > >>> We
> > > > >>>> can keep the naming discussion open until the implementation
> > phase.
> > > > >>>>
> > > > >>>>
> > > > >>>> Thank you~
> > > > >>>>
> > > > >>>> Xintong Song
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <
> > trohrmann@apache.org
> > > >
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Thanks for creating this FLIP Xintong.
> > > > >>>>>
> > > > >>>>> I agree with the previous comments that the memory
> configuration
> > > > >> should
> > > > >>> be
> > > > >>>>> as easy as possible. Every new knob has the potential to
> confuse
> > > > users
> > > > >>>>> and/or allows him to shoot himself in the foot. Consequently, I
> > am
> > > +1
> > > > >>> for
> > > > >>>>> the first proposal in the FLIP since it is simpler.
> > > > >>>>>
> > > > >>>>> Also +1 for Stephan's proposal to combine batch operator's and
> > > > >>>>> RocksDB's memory usage into one weight.
> > > > >>>>>
> > > > >>>>> Concerning the names for the two weights, I fear that we are
> > facing
> > > > >> one
> > > > >>> of
> > > > >>>>> the two hard things in computer science. To add another idea,
> we
> > > > could
> > > > >>>>> name
> > > > >>>>> them "Flink memory"/"Internal memory" and "Python memory".
> > > > >>>>>
> > > > >>>>> For the sake of making the scope of the FLIP as small as
> possible
> > > and
> > > > >> to
> > > > >>>>> develop the feature incrementally, I think that Aljoscha's
> > proposal
> > > > to
> > > > >>>>> make
> > > > >>>>> it non-configurable for the first step sounds like a good idea.
> > As
> > > a
> > > > >>> next
> > > > >>>>> step (and also if we see need), we can make the memory weights
> > > > >>>>> configurable
> > > > >>>>> via the configuration. And last, we could expose it via the
> > > > >>>>> ExecutionConfig
> > > > >>>>> if it is required.
> > > > >>>>>
> > > > >>>>> Cheers,
> > > > >>>>> Till
> > > > >>>>>
> > > > >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <
> > > aljoscha@apache.org
> > > > >
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> Hi,
> > > > >>>>>>
> > > > >>>>>> playing devils advocate here: should we even make the memory
> > > weights
> > > > >>>>>> configurable? We could go with weights that should make sense
> > for
> > > > >> most
> > > > >>>>>> cases in the first version and only introduce configurable
> > weights
> > > > >>> when
> > > > >>>>>> (if) users need them.
> > > > >>>>>>
> > > > >>>>>> Regarding where/how things are configured, I think that most
> > > things
> > > > >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!).
> This
> > > > >> makes
> > > > >>>>>> them configurable via flink-conf.yaml and via command line
> > > > >> parameters,
> > > > >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can
> think
> > > > >> about
> > > > >>>>>> offering programmatic API for cases where it makes sense, of
> > > course.
> > > > >>>>>>
> > > > >>>>>> Regarding naming one of the configurable weights
> > > > >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to
> > be
> > > > >> that
> > > > >>>>>> specific because the option will not age well. For example
> when
> > we
> > > > >>> want
> > > > >>>>>> to change which group of memory consumers are configured
> > together
> > > or
> > > > >>>>>> when we add something new.
> > > > >>>>>>
> > > > >>>>>> Best,
> > > > >>>>>> Aljoscha
> > > > >>>>>>
> > > > >>>>>> On 31.08.20 08:13, Xintong Song wrote:
> > > > >>>>>>> Thanks for the feedbacks, @Stephan
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>   - There is a push to make as much as possible configurable
> > via
> > > > >>> the
> > > > >>>>>> main
> > > > >>>>>>>> configuration, and not only in code. Specifically values for
> > > > >>>>> operations
> > > > >>>>>> and
> > > > >>>>>>>> tuning.
> > > > >>>>>>>>     I think it would be more important to have such memory
> > > > >> weights
> > > > >>>>> in
> > > > >>>>>> the
> > > > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> I can see the benefit that having memory weights in the main
> > > > >>>>>> configuration
> > > > >>>>>>> makes tuning easier, which makes great sense to me. On the
> > other
> > > > >>> hand,
> > > > >>>>>> what
> > > > >>>>>>> we lose is the flexibility to have different weights for jobs
> > > > >>> running
> > > > >>>>> in
> > > > >>>>>>> the same Flink cluster. It seems to me the problem is that we
> > > > >> don't
> > > > >>>>> have
> > > > >>>>>> an
> > > > >>>>>>> easy way to overwrite job-specific configurations without
> > > touching
> > > > >>> the
> > > > >>>>>>> codes.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Given the current status, what if we make the memory weights
> > > > >>>>> configurable
> > > > >>>>>>> through both the main configuration and the programming API?
> > The
> > > > >>> main
> > > > >>>>>>> configuration should take effect iff the weights are not
> > > > >> explicitly
> > > > >>>>>>> specified through the programming API. In this way, job
> cluster
> > > > >>> users
> > > > >>>>> can
> > > > >>>>>>> easily tune the weight through the main configuration, while
> > > > >> session
> > > > >>>>>>> cluster users, if they want to have different weights for
> jobs,
> > > > >> can
> > > > >>>>> still
> > > > >>>>>>> overwrite the weight through execution configs.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>   - My recommendation would be to keep this as simple as
> > > > >> possible.
> > > > >>>>> This
> > > > >>>>>>>> will make a lot of configuration code harder, and make it
> > harder
> > > > >>> for
> > > > >>>>>> users
> > > > >>>>>>>> to understand Flink's memory model.
> > > > >>>>>>>>     Making things as easy for users to understand is very
> > > > >>> important
> > > > >>>>> in
> > > > >>>>>> my
> > > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> > > > >> better
> > > > >>>>> than
> > > > >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> > > > >>>>>>>
> > > > >>>>>>> +1 from my side.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>   - For the simplicity, we could go even further and simply
> > have
> > > > >>> two
> > > > >>>>>> memory
> > > > >>>>>>>> users at the moment: The operator algorithm/data-structure
> and
> > > > >> the
> > > > >>>>>> external
> > > > >>>>>>>> language process (Python for now).
> > > > >>>>>>>>     We never have batch algos and RocksDB mixed, having this
> > as
> > > > >>>>>> separate
> > > > >>>>>>>> options is confusing as it suggests this can be combined
> > > > >>>>> arbitrarily. I
> > > > >>>>>>>> also think that a slim possibility that we may ever combine
> > this
> > > > >> in
> > > > >>>>> the
> > > > >>>>>>>> future is not enough reason to make it more
> complex/confusing.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Good point. +1 for combining batch/rocksdb weights, for
> they're
> > > > >>> never
> > > > >>>>>> mixed
> > > > >>>>>>> together. We can even just name it
> > "StateBackend-BatchAlgorithm"
> > > > >> for
> > > > >>>>>>> explicitly.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> For "external language process", I'm not entirely sure.
> Future
> > > > >>>>> external
> > > > >>>>>>> languages are possibly mixed with python processes. To avoid
> > > later
> > > > >>>>>>> considering how to share external language memory across
> > > different
> > > > >>>>>>> languages, I would suggest to present the concept as "python
> > > > >> memory"
> > > > >>>>>> rather
> > > > >>>>>>> than "external language process memory".
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Thank you~
> > > > >>>>>>>
> > > > >>>>>>> Xintong Song
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <
> > sewen@apache.org>
> > > > >>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Thanks for driving this proposal. A few thoughts on the
> > current
> > > > >>>>> design:
> > > > >>>>>>>>
> > > > >>>>>>>>   - There is a push to make as much as possible configurable
> > via
> > > > >>> the
> > > > >>>>>> main
> > > > >>>>>>>> configuration, and not only in code. Specifically values for
> > > > >>>>> operations
> > > > >>>>>> and
> > > > >>>>>>>> tuning.
> > > > >>>>>>>>     I think it would be more important to have such memory
> > > > >> weights
> > > > >>>>> in
> > > > >>>>>> the
> > > > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > > > >>>>>>>>
> > > > >>>>>>>>   - My recommendation would be to keep this as simple as
> > > > >> possible.
> > > > >>>>> This
> > > > >>>>>>>> will make a lot of configuration code harder, and make it
> > harder
> > > > >>> for
> > > > >>>>>> users
> > > > >>>>>>>> to understand Flink's memory model.
> > > > >>>>>>>>     Making things as easy for users to understand is very
> > > > >>> important
> > > > >>>>> in
> > > > >>>>>> my
> > > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> > > > >> better
> > > > >>>>> than
> > > > >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> > > > >>>>>>>>
> > > > >>>>>>>>   - For the simplicity, we could go even further and simply
> > have
> > > > >>> two
> > > > >>>>>> memory
> > > > >>>>>>>> users at the moment: The operator algorithm/data-structure
> and
> > > > >> the
> > > > >>>>>> external
> > > > >>>>>>>> language process (Python for now).
> > > > >>>>>>>>     We never have batch algos and RocksDB mixed, having this
> > as
> > > > >>>>>> separate
> > > > >>>>>>>> options is confusing as it suggests this can be combined
> > > > >>>>> arbitrarily. I
> > > > >>>>>>>> also think that a slim possibility that we may ever combine
> > this
> > > > >> in
> > > > >>>>> the
> > > > >>>>>>>> future is not enough reason to make it more
> complex/confusing.
> > > > >>>>>>>>
> > > > >>>>>>>>   - I am also not aware of any plans to combine the network
> > and
> > > > >>>>>> operator
> > > > >>>>>>>> memory. Not that it would be infeasible to do this, but I
> > think
> > > > >>> this
> > > > >>>>>> would
> > > > >>>>>>>> also be orthogonal to this change, and I am not sure this
> > would
> > > > >> be
> > > > >>>>>> solved
> > > > >>>>>>>> with static weights. So trying to get network memory into
> this
> > > > >>>>> proposal
> > > > >>>>>>>> seems pre-mature to me.
> > > > >>>>>>>>
> > > > >>>>>>>> Best,
> > > > >>>>>>>> Stephan
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> > > > >>> tonysong820@gmail.com
> > > > >>>>>>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> A quick question, does network memory treated as managed
> > > memory
> > > > >>>>> now?
> > > > >>>>>> Or
> > > > >>>>>>>>> in
> > > > >>>>>>>>>> the future?
> > > > >>>>>>>>>>
> > > > >>>>>>>>> No, network memory is independent from managed memory ATM.
> > And
> > > > >> I'm
> > > > >>>>> not
> > > > >>>>>>>>> aware of any plan to combine these two.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Any insights there?
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thank you~
> > > > >>>>>>>>>
> > > > >>>>>>>>> Xintong Song
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <
> ykt836@gmail.com
> > >
> > > > >>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> A quick question, does network memory treated as managed
> > > memory
> > > > >>>>> now?
> > > > >>>>>> Or
> > > > >>>>>>>>> in
> > > > >>>>>>>>>> the future?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Best,
> > > > >>>>>>>>>> Kurt
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> > > > >>>>> tonysong820@gmail.com>
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Hi devs,
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which
> > > > >>> proposes
> > > > >>>>> how
> > > > >>>>>>>>>>> managed memory should be shared by various use cases
> > within a
> > > > >>>>> slot.
> > > > >>>>>>>>> This
> > > > >>>>>>>>>> is
> > > > >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB
> > > > >> state
> > > > >>>>>>>> backend
> > > > >>>>>>>>>> and
> > > > >>>>>>>>>>> batch operators are the only use cases of managed memory
> > for
> > > > >>>>>>>> streaming
> > > > >>>>>>>>>> and
> > > > >>>>>>>>>>> batch jobs respectively, which is no longer true with the
> > > > >>>>>>>> introduction
> > > > >>>>>>>>> of
> > > > >>>>>>>>>>> Python UDFs.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Please notice that we have not reached consensus between
> > two
> > > > >>>>>>>> different
> > > > >>>>>>>>>>> designs. The major part of this FLIP describes one of the
> > > > >>>>> candidates,
> > > > >>>>>>>>>> while
> > > > >>>>>>>>>>> the alternative is discussed in the section "Rejected
> > > > >>>>> Alternatives".
> > > > >>>>>>>> We
> > > > >>>>>>>>>> are
> > > > >>>>>>>>>>> hoping to borrow intelligence from the community to help
> us
> > > > >>>>> resolve
> > > > >>>>>>>> the
> > > > >>>>>>>>>>> disagreement.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Any feedback would be appreciated.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Thank you~
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Xintong Song
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> [1]
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> [2]
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Till Rohrmann <tr...@apache.org>.

Thanks for updating the FLIP Xintong. It looks good to me. One minor
comment is that we could name the configuration parameter
also taskmanager.memory.managed.consumer-weights which might be a bit more
expressive what this option does.

Cheers,
Till

On Thu, Sep 3, 2020 at 12:44 PM Xintong Song <to...@gmail.com> wrote:

> Thanks all for the feedback.
>
> FYI, I've opened a voting thread[1] on this.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-141-Intra-Slot-Managed-Memory-Sharing-td44358.html
>
>
> On Thu, Sep 3, 2020 at 2:54 PM Zhu Zhu <re...@gmail.com> wrote:
>
> > Thanks for proposing this improvement! @Xintong
> > The proposal looks good to me. Agreed that we should make it as simple as
> > possible for users to understand.
> >
> > Thanks,
> > Zhu
> >
> > Dian Fu <di...@gmail.com> 于2020年9月3日周四 下午2:11写道：
> >
> > > Thanks for driving this FLIP, Xintong! +1 to the updated version.
> > >
> > > > 在 2020年9月2日，下午6:09，Xintong Song <to...@gmail.com> 写道：
> > > >
> > > > Thanks for the input, Yu.
> > > >
> > > > I believe the current proposal should work with RocksDB, or any other
> > > state
> > > > backend, using memory at either the slot or the scope. With the
> > proposed
> > > > approach, all we need is an indicator (e.g., a configuration option)
> > > > telling us which scope should we calculate the fractions for.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <ca...@gmail.com> wrote:
> > > >
> > > >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
> > > >>
> > > >> Just one supplement for the RocksDB state backend part:
> > > >>
> > > >> It's true that currently we're using managed memory at the slot
> scope.
> > > >> However, IMHO, we may support setting weights for different stateful
> > > >> operators (for advanced usage) in future. For example, users may
> > choose
> > > to
> > > >> set higher weights for join operator over aggregation operator, to
> > give
> > > >> more memory to those with bigger states. In this case, we may also
> use
> > > >> managed memory at the operator scope for state backends. And if I
> > > >> understand correctly, the current design could cover this case well.
> > > >>
> > > >> Best Regards,
> > > >> Yu
> > > >>
> > > >>
> > > >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <to...@gmail.com>
> > > wrote:
> > > >>
> > > >>> Thanks all for the feedback and discussion.
> > > >>>
> > > >>> I have updated the FLIP, with the following changes.
> > > >>>
> > > >>>   - Choose the main proposal over the alternative approach
> > > >>>   - Combine weights of RocksDB and batch operators
> > > >>>   - Expose weights through configuration options, rather than via
> > > >>>   ExecutionConfig.
> > > >>>   - Add implementation plan.
> > > >>>
> > > >>> Please help take another look.
> > > >>>
> > > >>> Thank you~
> > > >>>
> > > >>> Xintong Song
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <tonysong820@gmail.com
> >
> > > >> wrote:
> > > >>>
> > > >>>> Thanks for the inputs, Aljoscha & Till.
> > > >>>>
> > > >>>>
> > > >>>> # Weight Configuration
> > > >>>>
> > > >>>>
> > > >>>> I think exposing the knobs incrementally is a good idea. However,
> > I'm
> > > >> not
> > > >>>> sure about non-configurable as the first step.
> > > >>>>
> > > >>>>
> > > >>>> Currently, users can tune memory for rocksdb
> > > >>>> ('taskmanager.memory.managed.size') and python
> > > >>>> ('python.fn-execution.[framework|buffer].memory.size') separately,
> > > >> which
> > > >>>> practically means any combination of rocksdb and python memory
> > sizes.
> > > >> If
> > > >>> we
> > > >>>> switch to non-configurable weights, that will be a regression
> > compared
> > > >> to
> > > >>>> 1.11.
> > > >>>>
> > > >>>>
> > > >>>> Therefore, I think exposing via configuration options might be a
> > good
> > > >>>> first step. And we can discuss exposing via ExecutionConfig if
> later
> > > we
> > > >>> see
> > > >>>> that requirement.
> > > >>>>
> > > >>>>
> > > >>>> # Naming of Weights
> > > >>>>
> > > >>>>
> > > >>>> I'm neutral for "Flink/Internal memory".
> > > >>>>
> > > >>>>
> > > >>>> I think the reason we can combine weights for batch algorithms and
> > > >> state
> > > >>>> backends is that they are never mixed together. My only concern
> > > >>>> for "Flink/Internal memory", which might not be a problem at the
> > > >> moment,
> > > >>> is
> > > >>>> that what if new memory use cases appear in the future, which can
> > also
> > > >> be
> > > >>>> described by "Flink/Internal memory" but is not guaranteed not
> mixed
> > > >> with
> > > >>>> batch algorithms or state backends?
> > > >>>>
> > > >>>>
> > > >>>> Anyway, I think the naming should not block this FLIP, as long as
> we
> > > >> have
> > > >>>> consensus on combining the two weights for rocksdb and batch
> > > >> algorithms.
> > > >>> We
> > > >>>> can keep the naming discussion open until the implementation
> phase.
> > > >>>>
> > > >>>>
> > > >>>> Thank you~
> > > >>>>
> > > >>>> Xintong Song
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <
> trohrmann@apache.org
> > >
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Thanks for creating this FLIP Xintong.
> > > >>>>>
> > > >>>>> I agree with the previous comments that the memory configuration
> > > >> should
> > > >>> be
> > > >>>>> as easy as possible. Every new knob has the potential to confuse
> > > users
> > > >>>>> and/or allows him to shoot himself in the foot. Consequently, I
> am
> > +1
> > > >>> for
> > > >>>>> the first proposal in the FLIP since it is simpler.
> > > >>>>>
> > > >>>>> Also +1 for Stephan's proposal to combine batch operator's and
> > > >>>>> RocksDB's memory usage into one weight.
> > > >>>>>
> > > >>>>> Concerning the names for the two weights, I fear that we are
> facing
> > > >> one
> > > >>> of
> > > >>>>> the two hard things in computer science. To add another idea, we
> > > could
> > > >>>>> name
> > > >>>>> them "Flink memory"/"Internal memory" and "Python memory".
> > > >>>>>
> > > >>>>> For the sake of making the scope of the FLIP as small as possible
> > and
> > > >> to
> > > >>>>> develop the feature incrementally, I think that Aljoscha's
> proposal
> > > to
> > > >>>>> make
> > > >>>>> it non-configurable for the first step sounds like a good idea.
> As
> > a
> > > >>> next
> > > >>>>> step (and also if we see need), we can make the memory weights
> > > >>>>> configurable
> > > >>>>> via the configuration. And last, we could expose it via the
> > > >>>>> ExecutionConfig
> > > >>>>> if it is required.
> > > >>>>>
> > > >>>>> Cheers,
> > > >>>>> Till
> > > >>>>>
> > > >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <
> > aljoscha@apache.org
> > > >
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi,
> > > >>>>>>
> > > >>>>>> playing devils advocate here: should we even make the memory
> > weights
> > > >>>>>> configurable? We could go with weights that should make sense
> for
> > > >> most
> > > >>>>>> cases in the first version and only introduce configurable
> weights
> > > >>> when
> > > >>>>>> (if) users need them.
> > > >>>>>>
> > > >>>>>> Regarding where/how things are configured, I think that most
> > things
> > > >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This
> > > >> makes
> > > >>>>>> them configurable via flink-conf.yaml and via command line
> > > >> parameters,
> > > >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think
> > > >> about
> > > >>>>>> offering programmatic API for cases where it makes sense, of
> > course.
> > > >>>>>>
> > > >>>>>> Regarding naming one of the configurable weights
> > > >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to
> be
> > > >> that
> > > >>>>>> specific because the option will not age well. For example when
> we
> > > >>> want
> > > >>>>>> to change which group of memory consumers are configured
> together
> > or
> > > >>>>>> when we add something new.
> > > >>>>>>
> > > >>>>>> Best,
> > > >>>>>> Aljoscha
> > > >>>>>>
> > > >>>>>> On 31.08.20 08:13, Xintong Song wrote:
> > > >>>>>>> Thanks for the feedbacks, @Stephan
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>   - There is a push to make as much as possible configurable
> via
> > > >>> the
> > > >>>>>> main
> > > >>>>>>>> configuration, and not only in code. Specifically values for
> > > >>>>> operations
> > > >>>>>> and
> > > >>>>>>>> tuning.
> > > >>>>>>>>     I think it would be more important to have such memory
> > > >> weights
> > > >>>>> in
> > > >>>>>> the
> > > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> I can see the benefit that having memory weights in the main
> > > >>>>>> configuration
> > > >>>>>>> makes tuning easier, which makes great sense to me. On the
> other
> > > >>> hand,
> > > >>>>>> what
> > > >>>>>>> we lose is the flexibility to have different weights for jobs
> > > >>> running
> > > >>>>> in
> > > >>>>>>> the same Flink cluster. It seems to me the problem is that we
> > > >> don't
> > > >>>>> have
> > > >>>>>> an
> > > >>>>>>> easy way to overwrite job-specific configurations without
> > touching
> > > >>> the
> > > >>>>>>> codes.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Given the current status, what if we make the memory weights
> > > >>>>> configurable
> > > >>>>>>> through both the main configuration and the programming API?
> The
> > > >>> main
> > > >>>>>>> configuration should take effect iff the weights are not
> > > >> explicitly
> > > >>>>>>> specified through the programming API. In this way, job cluster
> > > >>> users
> > > >>>>> can
> > > >>>>>>> easily tune the weight through the main configuration, while
> > > >> session
> > > >>>>>>> cluster users, if they want to have different weights for jobs,
> > > >> can
> > > >>>>> still
> > > >>>>>>> overwrite the weight through execution configs.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>   - My recommendation would be to keep this as simple as
> > > >> possible.
> > > >>>>> This
> > > >>>>>>>> will make a lot of configuration code harder, and make it
> harder
> > > >>> for
> > > >>>>>> users
> > > >>>>>>>> to understand Flink's memory model.
> > > >>>>>>>>     Making things as easy for users to understand is very
> > > >>> important
> > > >>>>> in
> > > >>>>>> my
> > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> > > >> better
> > > >>>>> than
> > > >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> > > >>>>>>>
> > > >>>>>>> +1 from my side.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>   - For the simplicity, we could go even further and simply
> have
> > > >>> two
> > > >>>>>> memory
> > > >>>>>>>> users at the moment: The operator algorithm/data-structure and
> > > >> the
> > > >>>>>> external
> > > >>>>>>>> language process (Python for now).
> > > >>>>>>>>     We never have batch algos and RocksDB mixed, having this
> as
> > > >>>>>> separate
> > > >>>>>>>> options is confusing as it suggests this can be combined
> > > >>>>> arbitrarily. I
> > > >>>>>>>> also think that a slim possibility that we may ever combine
> this
> > > >> in
> > > >>>>> the
> > > >>>>>>>> future is not enough reason to make it more complex/confusing.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're
> > > >>> never
> > > >>>>>> mixed
> > > >>>>>>> together. We can even just name it
> "StateBackend-BatchAlgorithm"
> > > >> for
> > > >>>>>>> explicitly.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> For "external language process", I'm not entirely sure. Future
> > > >>>>> external
> > > >>>>>>> languages are possibly mixed with python processes. To avoid
> > later
> > > >>>>>>> considering how to share external language memory across
> > different
> > > >>>>>>> languages, I would suggest to present the concept as "python
> > > >> memory"
> > > >>>>>> rather
> > > >>>>>>> than "external language process memory".
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Thank you~
> > > >>>>>>>
> > > >>>>>>> Xintong Song
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <
> sewen@apache.org>
> > > >>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Thanks for driving this proposal. A few thoughts on the
> current
> > > >>>>> design:
> > > >>>>>>>>
> > > >>>>>>>>   - There is a push to make as much as possible configurable
> via
> > > >>> the
> > > >>>>>> main
> > > >>>>>>>> configuration, and not only in code. Specifically values for
> > > >>>>> operations
> > > >>>>>> and
> > > >>>>>>>> tuning.
> > > >>>>>>>>     I think it would be more important to have such memory
> > > >> weights
> > > >>>>> in
> > > >>>>>> the
> > > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > > >>>>>>>>
> > > >>>>>>>>   - My recommendation would be to keep this as simple as
> > > >> possible.
> > > >>>>> This
> > > >>>>>>>> will make a lot of configuration code harder, and make it
> harder
> > > >>> for
> > > >>>>>> users
> > > >>>>>>>> to understand Flink's memory model.
> > > >>>>>>>>     Making things as easy for users to understand is very
> > > >>> important
> > > >>>>> in
> > > >>>>>> my
> > > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> > > >> better
> > > >>>>> than
> > > >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> > > >>>>>>>>
> > > >>>>>>>>   - For the simplicity, we could go even further and simply
> have
> > > >>> two
> > > >>>>>> memory
> > > >>>>>>>> users at the moment: The operator algorithm/data-structure and
> > > >> the
> > > >>>>>> external
> > > >>>>>>>> language process (Python for now).
> > > >>>>>>>>     We never have batch algos and RocksDB mixed, having this
> as
> > > >>>>>> separate
> > > >>>>>>>> options is confusing as it suggests this can be combined
> > > >>>>> arbitrarily. I
> > > >>>>>>>> also think that a slim possibility that we may ever combine
> this
> > > >> in
> > > >>>>> the
> > > >>>>>>>> future is not enough reason to make it more complex/confusing.
> > > >>>>>>>>
> > > >>>>>>>>   - I am also not aware of any plans to combine the network
> and
> > > >>>>>> operator
> > > >>>>>>>> memory. Not that it would be infeasible to do this, but I
> think
> > > >>> this
> > > >>>>>> would
> > > >>>>>>>> also be orthogonal to this change, and I am not sure this
> would
> > > >> be
> > > >>>>>> solved
> > > >>>>>>>> with static weights. So trying to get network memory into this
> > > >>>>> proposal
> > > >>>>>>>> seems pre-mature to me.
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>> Stephan
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> > > >>> tonysong820@gmail.com
> > > >>>>>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> A quick question, does network memory treated as managed
> > memory
> > > >>>>> now?
> > > >>>>>> Or
> > > >>>>>>>>> in
> > > >>>>>>>>>> the future?
> > > >>>>>>>>>>
> > > >>>>>>>>> No, network memory is independent from managed memory ATM.
> And
> > > >> I'm
> > > >>>>> not
> > > >>>>>>>>> aware of any plan to combine these two.
> > > >>>>>>>>>
> > > >>>>>>>>> Any insights there?
> > > >>>>>>>>>
> > > >>>>>>>>> Thank you~
> > > >>>>>>>>>
> > > >>>>>>>>> Xintong Song
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <ykt836@gmail.com
> >
> > > >>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> A quick question, does network memory treated as managed
> > memory
> > > >>>>> now?
> > > >>>>>> Or
> > > >>>>>>>>> in
> > > >>>>>>>>>> the future?
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>> Kurt
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> > > >>>>> tonysong820@gmail.com>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi devs,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which
> > > >>> proposes
> > > >>>>> how
> > > >>>>>>>>>>> managed memory should be shared by various use cases
> within a
> > > >>>>> slot.
> > > >>>>>>>>> This
> > > >>>>>>>>>> is
> > > >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB
> > > >> state
> > > >>>>>>>> backend
> > > >>>>>>>>>> and
> > > >>>>>>>>>>> batch operators are the only use cases of managed memory
> for
> > > >>>>>>>> streaming
> > > >>>>>>>>>> and
> > > >>>>>>>>>>> batch jobs respectively, which is no longer true with the
> > > >>>>>>>> introduction
> > > >>>>>>>>> of
> > > >>>>>>>>>>> Python UDFs.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Please notice that we have not reached consensus between
> two
> > > >>>>>>>> different
> > > >>>>>>>>>>> designs. The major part of this FLIP describes one of the
> > > >>>>> candidates,
> > > >>>>>>>>>> while
> > > >>>>>>>>>>> the alternative is discussed in the section "Rejected
> > > >>>>> Alternatives".
> > > >>>>>>>> We
> > > >>>>>>>>>> are
> > > >>>>>>>>>>> hoping to borrow intelligence from the community to help us
> > > >>>>> resolve
> > > >>>>>>>> the
> > > >>>>>>>>>>> disagreement.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Any feedback would be appreciated.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Thank you~
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Xintong Song
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> [1]
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> [2]
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Xintong Song <to...@gmail.com>.

Thanks all for the feedback.

FYI, I've opened a voting thread[1] on this.

Thank you~

Xintong Song


[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-141-Intra-Slot-Managed-Memory-Sharing-td44358.html


On Thu, Sep 3, 2020 at 2:54 PM Zhu Zhu <re...@gmail.com> wrote:

> Thanks for proposing this improvement! @Xintong
> The proposal looks good to me. Agreed that we should make it as simple as
> possible for users to understand.
>
> Thanks,
> Zhu
>
> Dian Fu <di...@gmail.com> 于2020年9月3日周四 下午2:11写道：
>
> > Thanks for driving this FLIP, Xintong! +1 to the updated version.
> >
> > > 在 2020年9月2日，下午6:09，Xintong Song <to...@gmail.com> 写道：
> > >
> > > Thanks for the input, Yu.
> > >
> > > I believe the current proposal should work with RocksDB, or any other
> > state
> > > backend, using memory at either the slot or the scope. With the
> proposed
> > > approach, all we need is an indicator (e.g., a configuration option)
> > > telling us which scope should we calculate the fractions for.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <ca...@gmail.com> wrote:
> > >
> > >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
> > >>
> > >> Just one supplement for the RocksDB state backend part:
> > >>
> > >> It's true that currently we're using managed memory at the slot scope.
> > >> However, IMHO, we may support setting weights for different stateful
> > >> operators (for advanced usage) in future. For example, users may
> choose
> > to
> > >> set higher weights for join operator over aggregation operator, to
> give
> > >> more memory to those with bigger states. In this case, we may also use
> > >> managed memory at the operator scope for state backends. And if I
> > >> understand correctly, the current design could cover this case well.
> > >>
> > >> Best Regards,
> > >> Yu
> > >>
> > >>
> > >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <to...@gmail.com>
> > wrote:
> > >>
> > >>> Thanks all for the feedback and discussion.
> > >>>
> > >>> I have updated the FLIP, with the following changes.
> > >>>
> > >>>   - Choose the main proposal over the alternative approach
> > >>>   - Combine weights of RocksDB and batch operators
> > >>>   - Expose weights through configuration options, rather than via
> > >>>   ExecutionConfig.
> > >>>   - Add implementation plan.
> > >>>
> > >>> Please help take another look.
> > >>>
> > >>> Thank you~
> > >>>
> > >>> Xintong Song
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <to...@gmail.com>
> > >> wrote:
> > >>>
> > >>>> Thanks for the inputs, Aljoscha & Till.
> > >>>>
> > >>>>
> > >>>> # Weight Configuration
> > >>>>
> > >>>>
> > >>>> I think exposing the knobs incrementally is a good idea. However,
> I'm
> > >> not
> > >>>> sure about non-configurable as the first step.
> > >>>>
> > >>>>
> > >>>> Currently, users can tune memory for rocksdb
> > >>>> ('taskmanager.memory.managed.size') and python
> > >>>> ('python.fn-execution.[framework|buffer].memory.size') separately,
> > >> which
> > >>>> practically means any combination of rocksdb and python memory
> sizes.
> > >> If
> > >>> we
> > >>>> switch to non-configurable weights, that will be a regression
> compared
> > >> to
> > >>>> 1.11.
> > >>>>
> > >>>>
> > >>>> Therefore, I think exposing via configuration options might be a
> good
> > >>>> first step. And we can discuss exposing via ExecutionConfig if later
> > we
> > >>> see
> > >>>> that requirement.
> > >>>>
> > >>>>
> > >>>> # Naming of Weights
> > >>>>
> > >>>>
> > >>>> I'm neutral for "Flink/Internal memory".
> > >>>>
> > >>>>
> > >>>> I think the reason we can combine weights for batch algorithms and
> > >> state
> > >>>> backends is that they are never mixed together. My only concern
> > >>>> for "Flink/Internal memory", which might not be a problem at the
> > >> moment,
> > >>> is
> > >>>> that what if new memory use cases appear in the future, which can
> also
> > >> be
> > >>>> described by "Flink/Internal memory" but is not guaranteed not mixed
> > >> with
> > >>>> batch algorithms or state backends?
> > >>>>
> > >>>>
> > >>>> Anyway, I think the naming should not block this FLIP, as long as we
> > >> have
> > >>>> consensus on combining the two weights for rocksdb and batch
> > >> algorithms.
> > >>> We
> > >>>> can keep the naming discussion open until the implementation phase.
> > >>>>
> > >>>>
> > >>>> Thank you~
> > >>>>
> > >>>> Xintong Song
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <trohrmann@apache.org
> >
> > >>>> wrote:
> > >>>>
> > >>>>> Thanks for creating this FLIP Xintong.
> > >>>>>
> > >>>>> I agree with the previous comments that the memory configuration
> > >> should
> > >>> be
> > >>>>> as easy as possible. Every new knob has the potential to confuse
> > users
> > >>>>> and/or allows him to shoot himself in the foot. Consequently, I am
> +1
> > >>> for
> > >>>>> the first proposal in the FLIP since it is simpler.
> > >>>>>
> > >>>>> Also +1 for Stephan's proposal to combine batch operator's and
> > >>>>> RocksDB's memory usage into one weight.
> > >>>>>
> > >>>>> Concerning the names for the two weights, I fear that we are facing
> > >> one
> > >>> of
> > >>>>> the two hard things in computer science. To add another idea, we
> > could
> > >>>>> name
> > >>>>> them "Flink memory"/"Internal memory" and "Python memory".
> > >>>>>
> > >>>>> For the sake of making the scope of the FLIP as small as possible
> and
> > >> to
> > >>>>> develop the feature incrementally, I think that Aljoscha's proposal
> > to
> > >>>>> make
> > >>>>> it non-configurable for the first step sounds like a good idea. As
> a
> > >>> next
> > >>>>> step (and also if we see need), we can make the memory weights
> > >>>>> configurable
> > >>>>> via the configuration. And last, we could expose it via the
> > >>>>> ExecutionConfig
> > >>>>> if it is required.
> > >>>>>
> > >>>>> Cheers,
> > >>>>> Till
> > >>>>>
> > >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <
> aljoscha@apache.org
> > >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> playing devils advocate here: should we even make the memory
> weights
> > >>>>>> configurable? We could go with weights that should make sense for
> > >> most
> > >>>>>> cases in the first version and only introduce configurable weights
> > >>> when
> > >>>>>> (if) users need them.
> > >>>>>>
> > >>>>>> Regarding where/how things are configured, I think that most
> things
> > >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This
> > >> makes
> > >>>>>> them configurable via flink-conf.yaml and via command line
> > >> parameters,
> > >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think
> > >> about
> > >>>>>> offering programmatic API for cases where it makes sense, of
> course.
> > >>>>>>
> > >>>>>> Regarding naming one of the configurable weights
> > >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to be
> > >> that
> > >>>>>> specific because the option will not age well. For example when we
> > >>> want
> > >>>>>> to change which group of memory consumers are configured together
> or
> > >>>>>> when we add something new.
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Aljoscha
> > >>>>>>
> > >>>>>> On 31.08.20 08:13, Xintong Song wrote:
> > >>>>>>> Thanks for the feedbacks, @Stephan
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>   - There is a push to make as much as possible configurable via
> > >>> the
> > >>>>>> main
> > >>>>>>>> configuration, and not only in code. Specifically values for
> > >>>>> operations
> > >>>>>> and
> > >>>>>>>> tuning.
> > >>>>>>>>     I think it would be more important to have such memory
> > >> weights
> > >>>>> in
> > >>>>>> the
> > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> I can see the benefit that having memory weights in the main
> > >>>>>> configuration
> > >>>>>>> makes tuning easier, which makes great sense to me. On the other
> > >>> hand,
> > >>>>>> what
> > >>>>>>> we lose is the flexibility to have different weights for jobs
> > >>> running
> > >>>>> in
> > >>>>>>> the same Flink cluster. It seems to me the problem is that we
> > >> don't
> > >>>>> have
> > >>>>>> an
> > >>>>>>> easy way to overwrite job-specific configurations without
> touching
> > >>> the
> > >>>>>>> codes.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Given the current status, what if we make the memory weights
> > >>>>> configurable
> > >>>>>>> through both the main configuration and the programming API? The
> > >>> main
> > >>>>>>> configuration should take effect iff the weights are not
> > >> explicitly
> > >>>>>>> specified through the programming API. In this way, job cluster
> > >>> users
> > >>>>> can
> > >>>>>>> easily tune the weight through the main configuration, while
> > >> session
> > >>>>>>> cluster users, if they want to have different weights for jobs,
> > >> can
> > >>>>> still
> > >>>>>>> overwrite the weight through execution configs.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>   - My recommendation would be to keep this as simple as
> > >> possible.
> > >>>>> This
> > >>>>>>>> will make a lot of configuration code harder, and make it harder
> > >>> for
> > >>>>>> users
> > >>>>>>>> to understand Flink's memory model.
> > >>>>>>>>     Making things as easy for users to understand is very
> > >>> important
> > >>>>> in
> > >>>>>> my
> > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> > >> better
> > >>>>> than
> > >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> > >>>>>>>
> > >>>>>>> +1 from my side.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>   - For the simplicity, we could go even further and simply have
> > >>> two
> > >>>>>> memory
> > >>>>>>>> users at the moment: The operator algorithm/data-structure and
> > >> the
> > >>>>>> external
> > >>>>>>>> language process (Python for now).
> > >>>>>>>>     We never have batch algos and RocksDB mixed, having this as
> > >>>>>> separate
> > >>>>>>>> options is confusing as it suggests this can be combined
> > >>>>> arbitrarily. I
> > >>>>>>>> also think that a slim possibility that we may ever combine this
> > >> in
> > >>>>> the
> > >>>>>>>> future is not enough reason to make it more complex/confusing.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're
> > >>> never
> > >>>>>> mixed
> > >>>>>>> together. We can even just name it "StateBackend-BatchAlgorithm"
> > >> for
> > >>>>>>> explicitly.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> For "external language process", I'm not entirely sure. Future
> > >>>>> external
> > >>>>>>> languages are possibly mixed with python processes. To avoid
> later
> > >>>>>>> considering how to share external language memory across
> different
> > >>>>>>> languages, I would suggest to present the concept as "python
> > >> memory"
> > >>>>>> rather
> > >>>>>>> than "external language process memory".
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Thank you~
> > >>>>>>>
> > >>>>>>> Xintong Song
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org>
> > >>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Thanks for driving this proposal. A few thoughts on the current
> > >>>>> design:
> > >>>>>>>>
> > >>>>>>>>   - There is a push to make as much as possible configurable via
> > >>> the
> > >>>>>> main
> > >>>>>>>> configuration, and not only in code. Specifically values for
> > >>>>> operations
> > >>>>>> and
> > >>>>>>>> tuning.
> > >>>>>>>>     I think it would be more important to have such memory
> > >> weights
> > >>>>> in
> > >>>>>> the
> > >>>>>>>> config, compared to in the program API. /cc Aljoscha
> > >>>>>>>>
> > >>>>>>>>   - My recommendation would be to keep this as simple as
> > >> possible.
> > >>>>> This
> > >>>>>>>> will make a lot of configuration code harder, and make it harder
> > >>> for
> > >>>>>> users
> > >>>>>>>> to understand Flink's memory model.
> > >>>>>>>>     Making things as easy for users to understand is very
> > >>> important
> > >>>>> in
> > >>>>>> my
> > >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> > >> better
> > >>>>> than
> > >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> > >>>>>>>>
> > >>>>>>>>   - For the simplicity, we could go even further and simply have
> > >>> two
> > >>>>>> memory
> > >>>>>>>> users at the moment: The operator algorithm/data-structure and
> > >> the
> > >>>>>> external
> > >>>>>>>> language process (Python for now).
> > >>>>>>>>     We never have batch algos and RocksDB mixed, having this as
> > >>>>>> separate
> > >>>>>>>> options is confusing as it suggests this can be combined
> > >>>>> arbitrarily. I
> > >>>>>>>> also think that a slim possibility that we may ever combine this
> > >> in
> > >>>>> the
> > >>>>>>>> future is not enough reason to make it more complex/confusing.
> > >>>>>>>>
> > >>>>>>>>   - I am also not aware of any plans to combine the network and
> > >>>>>> operator
> > >>>>>>>> memory. Not that it would be infeasible to do this, but I think
> > >>> this
> > >>>>>> would
> > >>>>>>>> also be orthogonal to this change, and I am not sure this would
> > >> be
> > >>>>>> solved
> > >>>>>>>> with static weights. So trying to get network memory into this
> > >>>>> proposal
> > >>>>>>>> seems pre-mature to me.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Stephan
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> > >>> tonysong820@gmail.com
> > >>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> A quick question, does network memory treated as managed
> memory
> > >>>>> now?
> > >>>>>> Or
> > >>>>>>>>> in
> > >>>>>>>>>> the future?
> > >>>>>>>>>>
> > >>>>>>>>> No, network memory is independent from managed memory ATM. And
> > >> I'm
> > >>>>> not
> > >>>>>>>>> aware of any plan to combine these two.
> > >>>>>>>>>
> > >>>>>>>>> Any insights there?
> > >>>>>>>>>
> > >>>>>>>>> Thank you~
> > >>>>>>>>>
> > >>>>>>>>> Xintong Song
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <yk...@gmail.com>
> > >>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> A quick question, does network memory treated as managed
> memory
> > >>>>> now?
> > >>>>>> Or
> > >>>>>>>>> in
> > >>>>>>>>>> the future?
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Kurt
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> > >>>>> tonysong820@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi devs,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which
> > >>> proposes
> > >>>>> how
> > >>>>>>>>>>> managed memory should be shared by various use cases within a
> > >>>>> slot.
> > >>>>>>>>> This
> > >>>>>>>>>> is
> > >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB
> > >> state
> > >>>>>>>> backend
> > >>>>>>>>>> and
> > >>>>>>>>>>> batch operators are the only use cases of managed memory for
> > >>>>>>>> streaming
> > >>>>>>>>>> and
> > >>>>>>>>>>> batch jobs respectively, which is no longer true with the
> > >>>>>>>> introduction
> > >>>>>>>>> of
> > >>>>>>>>>>> Python UDFs.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Please notice that we have not reached consensus between two
> > >>>>>>>> different
> > >>>>>>>>>>> designs. The major part of this FLIP describes one of the
> > >>>>> candidates,
> > >>>>>>>>>> while
> > >>>>>>>>>>> the alternative is discussed in the section "Rejected
> > >>>>> Alternatives".
> > >>>>>>>> We
> > >>>>>>>>>> are
> > >>>>>>>>>>> hoping to borrow intelligence from the community to help us
> > >>>>> resolve
> > >>>>>>>> the
> > >>>>>>>>>>> disagreement.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Any feedback would be appreciated.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thank you~
> > >>>>>>>>>>>
> > >>>>>>>>>>> Xintong Song
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> [1]
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> > >>>>>>>>>>>
> > >>>>>>>>>>> [2]
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Zhu Zhu <re...@gmail.com>.

Thanks for proposing this improvement! @Xintong
The proposal looks good to me. Agreed that we should make it as simple as
possible for users to understand.

Thanks,
Zhu

Dian Fu <di...@gmail.com> 于2020年9月3日周四 下午2:11写道：

> Thanks for driving this FLIP, Xintong! +1 to the updated version.
>
> > 在 2020年9月2日，下午6:09，Xintong Song <to...@gmail.com> 写道：
> >
> > Thanks for the input, Yu.
> >
> > I believe the current proposal should work with RocksDB, or any other
> state
> > backend, using memory at either the slot or the scope. With the proposed
> > approach, all we need is an indicator (e.g., a configuration option)
> > telling us which scope should we calculate the fractions for.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Wed, Sep 2, 2020 at 4:53 PM Yu Li <ca...@gmail.com> wrote:
> >
> >> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
> >>
> >> Just one supplement for the RocksDB state backend part:
> >>
> >> It's true that currently we're using managed memory at the slot scope.
> >> However, IMHO, we may support setting weights for different stateful
> >> operators (for advanced usage) in future. For example, users may choose
> to
> >> set higher weights for join operator over aggregation operator, to give
> >> more memory to those with bigger states. In this case, we may also use
> >> managed memory at the operator scope for state backends. And if I
> >> understand correctly, the current design could cover this case well.
> >>
> >> Best Regards,
> >> Yu
> >>
> >>
> >> On Wed, 2 Sep 2020 at 15:39, Xintong Song <to...@gmail.com>
> wrote:
> >>
> >>> Thanks all for the feedback and discussion.
> >>>
> >>> I have updated the FLIP, with the following changes.
> >>>
> >>>   - Choose the main proposal over the alternative approach
> >>>   - Combine weights of RocksDB and batch operators
> >>>   - Expose weights through configuration options, rather than via
> >>>   ExecutionConfig.
> >>>   - Add implementation plan.
> >>>
> >>> Please help take another look.
> >>>
> >>> Thank you~
> >>>
> >>> Xintong Song
> >>>
> >>>
> >>>
> >>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <to...@gmail.com>
> >> wrote:
> >>>
> >>>> Thanks for the inputs, Aljoscha & Till.
> >>>>
> >>>>
> >>>> # Weight Configuration
> >>>>
> >>>>
> >>>> I think exposing the knobs incrementally is a good idea. However, I'm
> >> not
> >>>> sure about non-configurable as the first step.
> >>>>
> >>>>
> >>>> Currently, users can tune memory for rocksdb
> >>>> ('taskmanager.memory.managed.size') and python
> >>>> ('python.fn-execution.[framework|buffer].memory.size') separately,
> >> which
> >>>> practically means any combination of rocksdb and python memory sizes.
> >> If
> >>> we
> >>>> switch to non-configurable weights, that will be a regression compared
> >> to
> >>>> 1.11.
> >>>>
> >>>>
> >>>> Therefore, I think exposing via configuration options might be a good
> >>>> first step. And we can discuss exposing via ExecutionConfig if later
> we
> >>> see
> >>>> that requirement.
> >>>>
> >>>>
> >>>> # Naming of Weights
> >>>>
> >>>>
> >>>> I'm neutral for "Flink/Internal memory".
> >>>>
> >>>>
> >>>> I think the reason we can combine weights for batch algorithms and
> >> state
> >>>> backends is that they are never mixed together. My only concern
> >>>> for "Flink/Internal memory", which might not be a problem at the
> >> moment,
> >>> is
> >>>> that what if new memory use cases appear in the future, which can also
> >> be
> >>>> described by "Flink/Internal memory" but is not guaranteed not mixed
> >> with
> >>>> batch algorithms or state backends?
> >>>>
> >>>>
> >>>> Anyway, I think the naming should not block this FLIP, as long as we
> >> have
> >>>> consensus on combining the two weights for rocksdb and batch
> >> algorithms.
> >>> We
> >>>> can keep the naming discussion open until the implementation phase.
> >>>>
> >>>>
> >>>> Thank you~
> >>>>
> >>>> Xintong Song
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <tr...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Thanks for creating this FLIP Xintong.
> >>>>>
> >>>>> I agree with the previous comments that the memory configuration
> >> should
> >>> be
> >>>>> as easy as possible. Every new knob has the potential to confuse
> users
> >>>>> and/or allows him to shoot himself in the foot. Consequently, I am +1
> >>> for
> >>>>> the first proposal in the FLIP since it is simpler.
> >>>>>
> >>>>> Also +1 for Stephan's proposal to combine batch operator's and
> >>>>> RocksDB's memory usage into one weight.
> >>>>>
> >>>>> Concerning the names for the two weights, I fear that we are facing
> >> one
> >>> of
> >>>>> the two hard things in computer science. To add another idea, we
> could
> >>>>> name
> >>>>> them "Flink memory"/"Internal memory" and "Python memory".
> >>>>>
> >>>>> For the sake of making the scope of the FLIP as small as possible and
> >> to
> >>>>> develop the feature incrementally, I think that Aljoscha's proposal
> to
> >>>>> make
> >>>>> it non-configurable for the first step sounds like a good idea. As a
> >>> next
> >>>>> step (and also if we see need), we can make the memory weights
> >>>>> configurable
> >>>>> via the configuration. And last, we could expose it via the
> >>>>> ExecutionConfig
> >>>>> if it is required.
> >>>>>
> >>>>> Cheers,
> >>>>> Till
> >>>>>
> >>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <aljoscha@apache.org
> >
> >>>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> playing devils advocate here: should we even make the memory weights
> >>>>>> configurable? We could go with weights that should make sense for
> >> most
> >>>>>> cases in the first version and only introduce configurable weights
> >>> when
> >>>>>> (if) users need them.
> >>>>>>
> >>>>>> Regarding where/how things are configured, I think that most things
> >>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This
> >> makes
> >>>>>> them configurable via flink-conf.yaml and via command line
> >> parameters,
> >>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think
> >> about
> >>>>>> offering programmatic API for cases where it makes sense, of course.
> >>>>>>
> >>>>>> Regarding naming one of the configurable weights
> >>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to be
> >> that
> >>>>>> specific because the option will not age well. For example when we
> >>> want
> >>>>>> to change which group of memory consumers are configured together or
> >>>>>> when we add something new.
> >>>>>>
> >>>>>> Best,
> >>>>>> Aljoscha
> >>>>>>
> >>>>>> On 31.08.20 08:13, Xintong Song wrote:
> >>>>>>> Thanks for the feedbacks, @Stephan
> >>>>>>>
> >>>>>>>
> >>>>>>>   - There is a push to make as much as possible configurable via
> >>> the
> >>>>>> main
> >>>>>>>> configuration, and not only in code. Specifically values for
> >>>>> operations
> >>>>>> and
> >>>>>>>> tuning.
> >>>>>>>>     I think it would be more important to have such memory
> >> weights
> >>>>> in
> >>>>>> the
> >>>>>>>> config, compared to in the program API. /cc Aljoscha
> >>>>>>>
> >>>>>>>
> >>>>>>> I can see the benefit that having memory weights in the main
> >>>>>> configuration
> >>>>>>> makes tuning easier, which makes great sense to me. On the other
> >>> hand,
> >>>>>> what
> >>>>>>> we lose is the flexibility to have different weights for jobs
> >>> running
> >>>>> in
> >>>>>>> the same Flink cluster. It seems to me the problem is that we
> >> don't
> >>>>> have
> >>>>>> an
> >>>>>>> easy way to overwrite job-specific configurations without touching
> >>> the
> >>>>>>> codes.
> >>>>>>>
> >>>>>>>
> >>>>>>> Given the current status, what if we make the memory weights
> >>>>> configurable
> >>>>>>> through both the main configuration and the programming API? The
> >>> main
> >>>>>>> configuration should take effect iff the weights are not
> >> explicitly
> >>>>>>> specified through the programming API. In this way, job cluster
> >>> users
> >>>>> can
> >>>>>>> easily tune the weight through the main configuration, while
> >> session
> >>>>>>> cluster users, if they want to have different weights for jobs,
> >> can
> >>>>> still
> >>>>>>> overwrite the weight through execution configs.
> >>>>>>>
> >>>>>>>
> >>>>>>>   - My recommendation would be to keep this as simple as
> >> possible.
> >>>>> This
> >>>>>>>> will make a lot of configuration code harder, and make it harder
> >>> for
> >>>>>> users
> >>>>>>>> to understand Flink's memory model.
> >>>>>>>>     Making things as easy for users to understand is very
> >>> important
> >>>>> in
> >>>>>> my
> >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> >> better
> >>>>> than
> >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> >>>>>>>
> >>>>>>> +1 from my side.
> >>>>>>>
> >>>>>>>
> >>>>>>>   - For the simplicity, we could go even further and simply have
> >>> two
> >>>>>> memory
> >>>>>>>> users at the moment: The operator algorithm/data-structure and
> >> the
> >>>>>> external
> >>>>>>>> language process (Python for now).
> >>>>>>>>     We never have batch algos and RocksDB mixed, having this as
> >>>>>> separate
> >>>>>>>> options is confusing as it suggests this can be combined
> >>>>> arbitrarily. I
> >>>>>>>> also think that a slim possibility that we may ever combine this
> >> in
> >>>>> the
> >>>>>>>> future is not enough reason to make it more complex/confusing.
> >>>>>>>
> >>>>>>>
> >>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're
> >>> never
> >>>>>> mixed
> >>>>>>> together. We can even just name it "StateBackend-BatchAlgorithm"
> >> for
> >>>>>>> explicitly.
> >>>>>>>
> >>>>>>>
> >>>>>>> For "external language process", I'm not entirely sure. Future
> >>>>> external
> >>>>>>> languages are possibly mixed with python processes. To avoid later
> >>>>>>> considering how to share external language memory across different
> >>>>>>> languages, I would suggest to present the concept as "python
> >> memory"
> >>>>>> rather
> >>>>>>> than "external language process memory".
> >>>>>>>
> >>>>>>>
> >>>>>>> Thank you~
> >>>>>>>
> >>>>>>> Xintong Song
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for driving this proposal. A few thoughts on the current
> >>>>> design:
> >>>>>>>>
> >>>>>>>>   - There is a push to make as much as possible configurable via
> >>> the
> >>>>>> main
> >>>>>>>> configuration, and not only in code. Specifically values for
> >>>>> operations
> >>>>>> and
> >>>>>>>> tuning.
> >>>>>>>>     I think it would be more important to have such memory
> >> weights
> >>>>> in
> >>>>>> the
> >>>>>>>> config, compared to in the program API. /cc Aljoscha
> >>>>>>>>
> >>>>>>>>   - My recommendation would be to keep this as simple as
> >> possible.
> >>>>> This
> >>>>>>>> will make a lot of configuration code harder, and make it harder
> >>> for
> >>>>>> users
> >>>>>>>> to understand Flink's memory model.
> >>>>>>>>     Making things as easy for users to understand is very
> >>> important
> >>>>> in
> >>>>>> my
> >>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
> >> better
> >>>>> than
> >>>>>>>> the alternative proposal listed at the end of the FLIP page.
> >>>>>>>>
> >>>>>>>>   - For the simplicity, we could go even further and simply have
> >>> two
> >>>>>> memory
> >>>>>>>> users at the moment: The operator algorithm/data-structure and
> >> the
> >>>>>> external
> >>>>>>>> language process (Python for now).
> >>>>>>>>     We never have batch algos and RocksDB mixed, having this as
> >>>>>> separate
> >>>>>>>> options is confusing as it suggests this can be combined
> >>>>> arbitrarily. I
> >>>>>>>> also think that a slim possibility that we may ever combine this
> >> in
> >>>>> the
> >>>>>>>> future is not enough reason to make it more complex/confusing.
> >>>>>>>>
> >>>>>>>>   - I am also not aware of any plans to combine the network and
> >>>>>> operator
> >>>>>>>> memory. Not that it would be infeasible to do this, but I think
> >>> this
> >>>>>> would
> >>>>>>>> also be orthogonal to this change, and I am not sure this would
> >> be
> >>>>>> solved
> >>>>>>>> with static weights. So trying to get network memory into this
> >>>>> proposal
> >>>>>>>> seems pre-mature to me.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Stephan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> >>> tonysong820@gmail.com
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> A quick question, does network memory treated as managed memory
> >>>>> now?
> >>>>>> Or
> >>>>>>>>> in
> >>>>>>>>>> the future?
> >>>>>>>>>>
> >>>>>>>>> No, network memory is independent from managed memory ATM. And
> >> I'm
> >>>>> not
> >>>>>>>>> aware of any plan to combine these two.
> >>>>>>>>>
> >>>>>>>>> Any insights there?
> >>>>>>>>>
> >>>>>>>>> Thank you~
> >>>>>>>>>
> >>>>>>>>> Xintong Song
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <yk...@gmail.com>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> A quick question, does network memory treated as managed memory
> >>>>> now?
> >>>>>> Or
> >>>>>>>>> in
> >>>>>>>>>> the future?
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Kurt
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> >>>>> tonysong820@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi devs,
> >>>>>>>>>>>
> >>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which
> >>> proposes
> >>>>> how
> >>>>>>>>>>> managed memory should be shared by various use cases within a
> >>>>> slot.
> >>>>>>>>> This
> >>>>>>>>>> is
> >>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB
> >> state
> >>>>>>>> backend
> >>>>>>>>>> and
> >>>>>>>>>>> batch operators are the only use cases of managed memory for
> >>>>>>>> streaming
> >>>>>>>>>> and
> >>>>>>>>>>> batch jobs respectively, which is no longer true with the
> >>>>>>>> introduction
> >>>>>>>>> of
> >>>>>>>>>>> Python UDFs.
> >>>>>>>>>>>
> >>>>>>>>>>> Please notice that we have not reached consensus between two
> >>>>>>>> different
> >>>>>>>>>>> designs. The major part of this FLIP describes one of the
> >>>>> candidates,
> >>>>>>>>>> while
> >>>>>>>>>>> the alternative is discussed in the section "Rejected
> >>>>> Alternatives".
> >>>>>>>> We
> >>>>>>>>>> are
> >>>>>>>>>>> hoping to borrow intelligence from the community to help us
> >>>>> resolve
> >>>>>>>> the
> >>>>>>>>>>> disagreement.
> >>>>>>>>>>>
> >>>>>>>>>>> Any feedback would be appreciated.
> >>>>>>>>>>>
> >>>>>>>>>>> Thank you~
> >>>>>>>>>>>
> >>>>>>>>>>> Xintong Song
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> [1]
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> >>>>>>>>>>>
> >>>>>>>>>>> [2]
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Dian Fu <di...@gmail.com>.

Thanks for driving this FLIP, Xintong! +1 to the updated version.

> 在 2020年9月2日，下午6:09，Xintong Song <to...@gmail.com> 写道：
> 
> Thanks for the input, Yu.
> 
> I believe the current proposal should work with RocksDB, or any other state
> backend, using memory at either the slot or the scope. With the proposed
> approach, all we need is an indicator (e.g., a configuration option)
> telling us which scope should we calculate the fractions for.
> 
> Thank you~
> 
> Xintong Song
> 
> 
> 
> On Wed, Sep 2, 2020 at 4:53 PM Yu Li <ca...@gmail.com> wrote:
> 
>> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
>> 
>> Just one supplement for the RocksDB state backend part:
>> 
>> It's true that currently we're using managed memory at the slot scope.
>> However, IMHO, we may support setting weights for different stateful
>> operators (for advanced usage) in future. For example, users may choose to
>> set higher weights for join operator over aggregation operator, to give
>> more memory to those with bigger states. In this case, we may also use
>> managed memory at the operator scope for state backends. And if I
>> understand correctly, the current design could cover this case well.
>> 
>> Best Regards,
>> Yu
>> 
>> 
>> On Wed, 2 Sep 2020 at 15:39, Xintong Song <to...@gmail.com> wrote:
>> 
>>> Thanks all for the feedback and discussion.
>>> 
>>> I have updated the FLIP, with the following changes.
>>> 
>>>   - Choose the main proposal over the alternative approach
>>>   - Combine weights of RocksDB and batch operators
>>>   - Expose weights through configuration options, rather than via
>>>   ExecutionConfig.
>>>   - Add implementation plan.
>>> 
>>> Please help take another look.
>>> 
>>> Thank you~
>>> 
>>> Xintong Song
>>> 
>>> 
>>> 
>>> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <to...@gmail.com>
>> wrote:
>>> 
>>>> Thanks for the inputs, Aljoscha & Till.
>>>> 
>>>> 
>>>> # Weight Configuration
>>>> 
>>>> 
>>>> I think exposing the knobs incrementally is a good idea. However, I'm
>> not
>>>> sure about non-configurable as the first step.
>>>> 
>>>> 
>>>> Currently, users can tune memory for rocksdb
>>>> ('taskmanager.memory.managed.size') and python
>>>> ('python.fn-execution.[framework|buffer].memory.size') separately,
>> which
>>>> practically means any combination of rocksdb and python memory sizes.
>> If
>>> we
>>>> switch to non-configurable weights, that will be a regression compared
>> to
>>>> 1.11.
>>>> 
>>>> 
>>>> Therefore, I think exposing via configuration options might be a good
>>>> first step. And we can discuss exposing via ExecutionConfig if later we
>>> see
>>>> that requirement.
>>>> 
>>>> 
>>>> # Naming of Weights
>>>> 
>>>> 
>>>> I'm neutral for "Flink/Internal memory".
>>>> 
>>>> 
>>>> I think the reason we can combine weights for batch algorithms and
>> state
>>>> backends is that they are never mixed together. My only concern
>>>> for "Flink/Internal memory", which might not be a problem at the
>> moment,
>>> is
>>>> that what if new memory use cases appear in the future, which can also
>> be
>>>> described by "Flink/Internal memory" but is not guaranteed not mixed
>> with
>>>> batch algorithms or state backends?
>>>> 
>>>> 
>>>> Anyway, I think the naming should not block this FLIP, as long as we
>> have
>>>> consensus on combining the two weights for rocksdb and batch
>> algorithms.
>>> We
>>>> can keep the naming discussion open until the implementation phase.
>>>> 
>>>> 
>>>> Thank you~
>>>> 
>>>> Xintong Song
>>>> 
>>>> 
>>>> 
>>>> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <tr...@apache.org>
>>>> wrote:
>>>> 
>>>>> Thanks for creating this FLIP Xintong.
>>>>> 
>>>>> I agree with the previous comments that the memory configuration
>> should
>>> be
>>>>> as easy as possible. Every new knob has the potential to confuse users
>>>>> and/or allows him to shoot himself in the foot. Consequently, I am +1
>>> for
>>>>> the first proposal in the FLIP since it is simpler.
>>>>> 
>>>>> Also +1 for Stephan's proposal to combine batch operator's and
>>>>> RocksDB's memory usage into one weight.
>>>>> 
>>>>> Concerning the names for the two weights, I fear that we are facing
>> one
>>> of
>>>>> the two hard things in computer science. To add another idea, we could
>>>>> name
>>>>> them "Flink memory"/"Internal memory" and "Python memory".
>>>>> 
>>>>> For the sake of making the scope of the FLIP as small as possible and
>> to
>>>>> develop the feature incrementally, I think that Aljoscha's proposal to
>>>>> make
>>>>> it non-configurable for the first step sounds like a good idea. As a
>>> next
>>>>> step (and also if we see need), we can make the memory weights
>>>>> configurable
>>>>> via the configuration. And last, we could expose it via the
>>>>> ExecutionConfig
>>>>> if it is required.
>>>>> 
>>>>> Cheers,
>>>>> Till
>>>>> 
>>>>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <al...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> playing devils advocate here: should we even make the memory weights
>>>>>> configurable? We could go with weights that should make sense for
>> most
>>>>>> cases in the first version and only introduce configurable weights
>>> when
>>>>>> (if) users need them.
>>>>>> 
>>>>>> Regarding where/how things are configured, I think that most things
>>>>>> should be a ConfigOption first (Thanks cc'in me, Stephan!). This
>> makes
>>>>>> them configurable via flink-conf.yaml and via command line
>> parameters,
>>>>>> for example "bin/flink run -D memory.foo=bla ...". We can think
>> about
>>>>>> offering programmatic API for cases where it makes sense, of course.
>>>>>> 
>>>>>> Regarding naming one of the configurable weights
>>>>>> "StateBackend-BatchAlgorithm". I think it's not a good idea to be
>> that
>>>>>> specific because the option will not age well. For example when we
>>> want
>>>>>> to change which group of memory consumers are configured together or
>>>>>> when we add something new.
>>>>>> 
>>>>>> Best,
>>>>>> Aljoscha
>>>>>> 
>>>>>> On 31.08.20 08:13, Xintong Song wrote:
>>>>>>> Thanks for the feedbacks, @Stephan
>>>>>>> 
>>>>>>> 
>>>>>>>   - There is a push to make as much as possible configurable via
>>> the
>>>>>> main
>>>>>>>> configuration, and not only in code. Specifically values for
>>>>> operations
>>>>>> and
>>>>>>>> tuning.
>>>>>>>>     I think it would be more important to have such memory
>> weights
>>>>> in
>>>>>> the
>>>>>>>> config, compared to in the program API. /cc Aljoscha
>>>>>>> 
>>>>>>> 
>>>>>>> I can see the benefit that having memory weights in the main
>>>>>> configuration
>>>>>>> makes tuning easier, which makes great sense to me. On the other
>>> hand,
>>>>>> what
>>>>>>> we lose is the flexibility to have different weights for jobs
>>> running
>>>>> in
>>>>>>> the same Flink cluster. It seems to me the problem is that we
>> don't
>>>>> have
>>>>>> an
>>>>>>> easy way to overwrite job-specific configurations without touching
>>> the
>>>>>>> codes.
>>>>>>> 
>>>>>>> 
>>>>>>> Given the current status, what if we make the memory weights
>>>>> configurable
>>>>>>> through both the main configuration and the programming API? The
>>> main
>>>>>>> configuration should take effect iff the weights are not
>> explicitly
>>>>>>> specified through the programming API. In this way, job cluster
>>> users
>>>>> can
>>>>>>> easily tune the weight through the main configuration, while
>> session
>>>>>>> cluster users, if they want to have different weights for jobs,
>> can
>>>>> still
>>>>>>> overwrite the weight through execution configs.
>>>>>>> 
>>>>>>> 
>>>>>>>   - My recommendation would be to keep this as simple as
>> possible.
>>>>> This
>>>>>>>> will make a lot of configuration code harder, and make it harder
>>> for
>>>>>> users
>>>>>>>> to understand Flink's memory model.
>>>>>>>>     Making things as easy for users to understand is very
>>> important
>>>>> in
>>>>>> my
>>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
>> better
>>>>> than
>>>>>>>> the alternative proposal listed at the end of the FLIP page.
>>>>>>> 
>>>>>>> +1 from my side.
>>>>>>> 
>>>>>>> 
>>>>>>>   - For the simplicity, we could go even further and simply have
>>> two
>>>>>> memory
>>>>>>>> users at the moment: The operator algorithm/data-structure and
>> the
>>>>>> external
>>>>>>>> language process (Python for now).
>>>>>>>>     We never have batch algos and RocksDB mixed, having this as
>>>>>> separate
>>>>>>>> options is confusing as it suggests this can be combined
>>>>> arbitrarily. I
>>>>>>>> also think that a slim possibility that we may ever combine this
>> in
>>>>> the
>>>>>>>> future is not enough reason to make it more complex/confusing.
>>>>>>> 
>>>>>>> 
>>>>>>> Good point. +1 for combining batch/rocksdb weights, for they're
>>> never
>>>>>> mixed
>>>>>>> together. We can even just name it "StateBackend-BatchAlgorithm"
>> for
>>>>>>> explicitly.
>>>>>>> 
>>>>>>> 
>>>>>>> For "external language process", I'm not entirely sure. Future
>>>>> external
>>>>>>> languages are possibly mixed with python processes. To avoid later
>>>>>>> considering how to share external language memory across different
>>>>>>> languages, I would suggest to present the concept as "python
>> memory"
>>>>>> rather
>>>>>>> than "external language process memory".
>>>>>>> 
>>>>>>> 
>>>>>>> Thank you~
>>>>>>> 
>>>>>>> Xintong Song
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org>
>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks for driving this proposal. A few thoughts on the current
>>>>> design:
>>>>>>>> 
>>>>>>>>   - There is a push to make as much as possible configurable via
>>> the
>>>>>> main
>>>>>>>> configuration, and not only in code. Specifically values for
>>>>> operations
>>>>>> and
>>>>>>>> tuning.
>>>>>>>>     I think it would be more important to have such memory
>> weights
>>>>> in
>>>>>> the
>>>>>>>> config, compared to in the program API. /cc Aljoscha
>>>>>>>> 
>>>>>>>>   - My recommendation would be to keep this as simple as
>> possible.
>>>>> This
>>>>>>>> will make a lot of configuration code harder, and make it harder
>>> for
>>>>>> users
>>>>>>>> to understand Flink's memory model.
>>>>>>>>     Making things as easy for users to understand is very
>>> important
>>>>> in
>>>>>> my
>>>>>>>> opinion. In that regard, the main proposal in the FLIP seems
>> better
>>>>> than
>>>>>>>> the alternative proposal listed at the end of the FLIP page.
>>>>>>>> 
>>>>>>>>   - For the simplicity, we could go even further and simply have
>>> two
>>>>>> memory
>>>>>>>> users at the moment: The operator algorithm/data-structure and
>> the
>>>>>> external
>>>>>>>> language process (Python for now).
>>>>>>>>     We never have batch algos and RocksDB mixed, having this as
>>>>>> separate
>>>>>>>> options is confusing as it suggests this can be combined
>>>>> arbitrarily. I
>>>>>>>> also think that a slim possibility that we may ever combine this
>> in
>>>>> the
>>>>>>>> future is not enough reason to make it more complex/confusing.
>>>>>>>> 
>>>>>>>>   - I am also not aware of any plans to combine the network and
>>>>>> operator
>>>>>>>> memory. Not that it would be infeasible to do this, but I think
>>> this
>>>>>> would
>>>>>>>> also be orthogonal to this change, and I am not sure this would
>> be
>>>>>> solved
>>>>>>>> with static weights. So trying to get network memory into this
>>>>> proposal
>>>>>>>> seems pre-mature to me.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Stephan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
>>> tonysong820@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> A quick question, does network memory treated as managed memory
>>>>> now?
>>>>>> Or
>>>>>>>>> in
>>>>>>>>>> the future?
>>>>>>>>>> 
>>>>>>>>> No, network memory is independent from managed memory ATM. And
>> I'm
>>>>> not
>>>>>>>>> aware of any plan to combine these two.
>>>>>>>>> 
>>>>>>>>> Any insights there?
>>>>>>>>> 
>>>>>>>>> Thank you~
>>>>>>>>> 
>>>>>>>>> Xintong Song
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <yk...@gmail.com>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> A quick question, does network memory treated as managed memory
>>>>> now?
>>>>>> Or
>>>>>>>>> in
>>>>>>>>>> the future?
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Kurt
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
>>>>> tonysong820@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi devs,
>>>>>>>>>>> 
>>>>>>>>>>> I'd like to bring the discussion over FLIP-141[1], which
>>> proposes
>>>>> how
>>>>>>>>>>> managed memory should be shared by various use cases within a
>>>>> slot.
>>>>>>>>> This
>>>>>>>>>> is
>>>>>>>>>>> an extension to FLIP-53[2], where we assumed that RocksDB
>> state
>>>>>>>> backend
>>>>>>>>>> and
>>>>>>>>>>> batch operators are the only use cases of managed memory for
>>>>>>>> streaming
>>>>>>>>>> and
>>>>>>>>>>> batch jobs respectively, which is no longer true with the
>>>>>>>> introduction
>>>>>>>>> of
>>>>>>>>>>> Python UDFs.
>>>>>>>>>>> 
>>>>>>>>>>> Please notice that we have not reached consensus between two
>>>>>>>> different
>>>>>>>>>>> designs. The major part of this FLIP describes one of the
>>>>> candidates,
>>>>>>>>>> while
>>>>>>>>>>> the alternative is discussed in the section "Rejected
>>>>> Alternatives".
>>>>>>>> We
>>>>>>>>>> are
>>>>>>>>>>> hoping to borrow intelligence from the community to help us
>>>>> resolve
>>>>>>>> the
>>>>>>>>>>> disagreement.
>>>>>>>>>>> 
>>>>>>>>>>> Any feedback would be appreciated.
>>>>>>>>>>> 
>>>>>>>>>>> Thank you~
>>>>>>>>>>> 
>>>>>>>>>>> Xintong Song
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
>>>>>>>>>>> 
>>>>>>>>>>> [2]
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Xintong Song <to...@gmail.com>.

Thanks for the input, Yu.

I believe the current proposal should work with RocksDB, or any other state
backend, using memory at either the slot or the scope. With the proposed
approach, all we need is an indicator (e.g., a configuration option)
telling us which scope should we calculate the fractions for.

Thank you~

Xintong Song



On Wed, Sep 2, 2020 at 4:53 PM Yu Li <ca...@gmail.com> wrote:

> Thanks for compiling the FLIP Xintong, and +1 for the updated doc.
>
> Just one supplement for the RocksDB state backend part:
>
> It's true that currently we're using managed memory at the slot scope.
> However, IMHO, we may support setting weights for different stateful
> operators (for advanced usage) in future. For example, users may choose to
> set higher weights for join operator over aggregation operator, to give
> more memory to those with bigger states. In this case, we may also use
> managed memory at the operator scope for state backends. And if I
> understand correctly, the current design could cover this case well.
>
> Best Regards,
> Yu
>
>
> On Wed, 2 Sep 2020 at 15:39, Xintong Song <to...@gmail.com> wrote:
>
> > Thanks all for the feedback and discussion.
> >
> > I have updated the FLIP, with the following changes.
> >
> >    - Choose the main proposal over the alternative approach
> >    - Combine weights of RocksDB and batch operators
> >    - Expose weights through configuration options, rather than via
> >    ExecutionConfig.
> >    - Add implementation plan.
> >
> > Please help take another look.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <to...@gmail.com>
> wrote:
> >
> > > Thanks for the inputs, Aljoscha & Till.
> > >
> > >
> > > # Weight Configuration
> > >
> > >
> > > I think exposing the knobs incrementally is a good idea. However, I'm
> not
> > > sure about non-configurable as the first step.
> > >
> > >
> > > Currently, users can tune memory for rocksdb
> > > ('taskmanager.memory.managed.size') and python
> > > ('python.fn-execution.[framework|buffer].memory.size') separately,
> which
> > > practically means any combination of rocksdb and python memory sizes.
> If
> > we
> > > switch to non-configurable weights, that will be a regression compared
> to
> > > 1.11.
> > >
> > >
> > > Therefore, I think exposing via configuration options might be a good
> > > first step. And we can discuss exposing via ExecutionConfig if later we
> > see
> > > that requirement.
> > >
> > >
> > > # Naming of Weights
> > >
> > >
> > > I'm neutral for "Flink/Internal memory".
> > >
> > >
> > > I think the reason we can combine weights for batch algorithms and
> state
> > > backends is that they are never mixed together. My only concern
> > > for "Flink/Internal memory", which might not be a problem at the
> moment,
> > is
> > > that what if new memory use cases appear in the future, which can also
> be
> > > described by "Flink/Internal memory" but is not guaranteed not mixed
> with
> > > batch algorithms or state backends?
> > >
> > >
> > > Anyway, I think the naming should not block this FLIP, as long as we
> have
> > > consensus on combining the two weights for rocksdb and batch
> algorithms.
> > We
> > > can keep the naming discussion open until the implementation phase.
> > >
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <tr...@apache.org>
> > > wrote:
> > >
> > >> Thanks for creating this FLIP Xintong.
> > >>
> > >> I agree with the previous comments that the memory configuration
> should
> > be
> > >> as easy as possible. Every new knob has the potential to confuse users
> > >> and/or allows him to shoot himself in the foot. Consequently, I am +1
> > for
> > >> the first proposal in the FLIP since it is simpler.
> > >>
> > >> Also +1 for Stephan's proposal to combine batch operator's and
> > >> RocksDB's memory usage into one weight.
> > >>
> > >> Concerning the names for the two weights, I fear that we are facing
> one
> > of
> > >> the two hard things in computer science. To add another idea, we could
> > >> name
> > >> them "Flink memory"/"Internal memory" and "Python memory".
> > >>
> > >> For the sake of making the scope of the FLIP as small as possible and
> to
> > >> develop the feature incrementally, I think that Aljoscha's proposal to
> > >> make
> > >> it non-configurable for the first step sounds like a good idea. As a
> > next
> > >> step (and also if we see need), we can make the memory weights
> > >> configurable
> > >> via the configuration. And last, we could expose it via the
> > >> ExecutionConfig
> > >> if it is required.
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <al...@apache.org>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > playing devils advocate here: should we even make the memory weights
> > >> > configurable? We could go with weights that should make sense for
> most
> > >> > cases in the first version and only introduce configurable weights
> > when
> > >> > (if) users need them.
> > >> >
> > >> > Regarding where/how things are configured, I think that most things
> > >> > should be a ConfigOption first (Thanks cc'in me, Stephan!). This
> makes
> > >> > them configurable via flink-conf.yaml and via command line
> parameters,
> > >> > for example "bin/flink run -D memory.foo=bla ...". We can think
> about
> > >> > offering programmatic API for cases where it makes sense, of course.
> > >> >
> > >> > Regarding naming one of the configurable weights
> > >> > "StateBackend-BatchAlgorithm". I think it's not a good idea to be
> that
> > >> > specific because the option will not age well. For example when we
> > want
> > >> > to change which group of memory consumers are configured together or
> > >> > when we add something new.
> > >> >
> > >> > Best,
> > >> > Aljoscha
> > >> >
> > >> > On 31.08.20 08:13, Xintong Song wrote:
> > >> > > Thanks for the feedbacks, @Stephan
> > >> > >
> > >> > >
> > >> > >    - There is a push to make as much as possible configurable via
> > the
> > >> > main
> > >> > >> configuration, and not only in code. Specifically values for
> > >> operations
> > >> > and
> > >> > >> tuning.
> > >> > >>      I think it would be more important to have such memory
> weights
> > >> in
> > >> > the
> > >> > >> config, compared to in the program API. /cc Aljoscha
> > >> > >
> > >> > >
> > >> > > I can see the benefit that having memory weights in the main
> > >> > configuration
> > >> > > makes tuning easier, which makes great sense to me. On the other
> > hand,
> > >> > what
> > >> > > we lose is the flexibility to have different weights for jobs
> > running
> > >> in
> > >> > > the same Flink cluster. It seems to me the problem is that we
> don't
> > >> have
> > >> > an
> > >> > > easy way to overwrite job-specific configurations without touching
> > the
> > >> > > codes.
> > >> > >
> > >> > >
> > >> > > Given the current status, what if we make the memory weights
> > >> configurable
> > >> > > through both the main configuration and the programming API? The
> > main
> > >> > > configuration should take effect iff the weights are not
> explicitly
> > >> > > specified through the programming API. In this way, job cluster
> > users
> > >> can
> > >> > > easily tune the weight through the main configuration, while
> session
> > >> > > cluster users, if they want to have different weights for jobs,
> can
> > >> still
> > >> > > overwrite the weight through execution configs.
> > >> > >
> > >> > >
> > >> > >    - My recommendation would be to keep this as simple as
> possible.
> > >> This
> > >> > >> will make a lot of configuration code harder, and make it harder
> > for
> > >> > users
> > >> > >> to understand Flink's memory model.
> > >> > >>      Making things as easy for users to understand is very
> > important
> > >> in
> > >> > my
> > >> > >> opinion. In that regard, the main proposal in the FLIP seems
> better
> > >> than
> > >> > >> the alternative proposal listed at the end of the FLIP page.
> > >> > >
> > >> > > +1 from my side.
> > >> > >
> > >> > >
> > >> > >    - For the simplicity, we could go even further and simply have
> > two
> > >> > memory
> > >> > >> users at the moment: The operator algorithm/data-structure and
> the
> > >> > external
> > >> > >> language process (Python for now).
> > >> > >>      We never have batch algos and RocksDB mixed, having this as
> > >> > separate
> > >> > >> options is confusing as it suggests this can be combined
> > >> arbitrarily. I
> > >> > >> also think that a slim possibility that we may ever combine this
> in
> > >> the
> > >> > >> future is not enough reason to make it more complex/confusing.
> > >> > >
> > >> > >
> > >> > > Good point. +1 for combining batch/rocksdb weights, for they're
> > never
> > >> > mixed
> > >> > > together. We can even just name it "StateBackend-BatchAlgorithm"
> for
> > >> > > explicitly.
> > >> > >
> > >> > >
> > >> > > For "external language process", I'm not entirely sure. Future
> > >> external
> > >> > > languages are possibly mixed with python processes. To avoid later
> > >> > > considering how to share external language memory across different
> > >> > > languages, I would suggest to present the concept as "python
> memory"
> > >> > rather
> > >> > > than "external language process memory".
> > >> > >
> > >> > >
> > >> > > Thank you~
> > >> > >
> > >> > > Xintong Song
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org>
> > >> wrote:
> > >> > >
> > >> > >> Thanks for driving this proposal. A few thoughts on the current
> > >> design:
> > >> > >>
> > >> > >>    - There is a push to make as much as possible configurable via
> > the
> > >> > main
> > >> > >> configuration, and not only in code. Specifically values for
> > >> operations
> > >> > and
> > >> > >> tuning.
> > >> > >>      I think it would be more important to have such memory
> weights
> > >> in
> > >> > the
> > >> > >> config, compared to in the program API. /cc Aljoscha
> > >> > >>
> > >> > >>    - My recommendation would be to keep this as simple as
> possible.
> > >> This
> > >> > >> will make a lot of configuration code harder, and make it harder
> > for
> > >> > users
> > >> > >> to understand Flink's memory model.
> > >> > >>      Making things as easy for users to understand is very
> > important
> > >> in
> > >> > my
> > >> > >> opinion. In that regard, the main proposal in the FLIP seems
> better
> > >> than
> > >> > >> the alternative proposal listed at the end of the FLIP page.
> > >> > >>
> > >> > >>    - For the simplicity, we could go even further and simply have
> > two
> > >> > memory
> > >> > >> users at the moment: The operator algorithm/data-structure and
> the
> > >> > external
> > >> > >> language process (Python for now).
> > >> > >>      We never have batch algos and RocksDB mixed, having this as
> > >> > separate
> > >> > >> options is confusing as it suggests this can be combined
> > >> arbitrarily. I
> > >> > >> also think that a slim possibility that we may ever combine this
> in
> > >> the
> > >> > >> future is not enough reason to make it more complex/confusing.
> > >> > >>
> > >> > >>    - I am also not aware of any plans to combine the network and
> > >> > operator
> > >> > >> memory. Not that it would be infeasible to do this, but I think
> > this
> > >> > would
> > >> > >> also be orthogonal to this change, and I am not sure this would
> be
> > >> > solved
> > >> > >> with static weights. So trying to get network memory into this
> > >> proposal
> > >> > >> seems pre-mature to me.
> > >> > >>
> > >> > >> Best,
> > >> > >> Stephan
> > >> > >>
> > >> > >>
> > >> > >> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> > tonysong820@gmail.com
> > >> >
> > >> > >> wrote:
> > >> > >>
> > >> > >>>>
> > >> > >>>> A quick question, does network memory treated as managed memory
> > >> now?
> > >> > Or
> > >> > >>> in
> > >> > >>>> the future?
> > >> > >>>>
> > >> > >>> No, network memory is independent from managed memory ATM. And
> I'm
> > >> not
> > >> > >>> aware of any plan to combine these two.
> > >> > >>>
> > >> > >>> Any insights there?
> > >> > >>>
> > >> > >>> Thank you~
> > >> > >>>
> > >> > >>> Xintong Song
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <yk...@gmail.com>
> > >> wrote:
> > >> > >>>
> > >> > >>>> A quick question, does network memory treated as managed memory
> > >> now?
> > >> > Or
> > >> > >>> in
> > >> > >>>> the future?
> > >> > >>>>
> > >> > >>>> Best,
> > >> > >>>> Kurt
> > >> > >>>>
> > >> > >>>>
> > >> > >>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> > >> tonysong820@gmail.com>
> > >> > >>>> wrote:
> > >> > >>>>
> > >> > >>>>> Hi devs,
> > >> > >>>>>
> > >> > >>>>> I'd like to bring the discussion over FLIP-141[1], which
> > proposes
> > >> how
> > >> > >>>>> managed memory should be shared by various use cases within a
> > >> slot.
> > >> > >>> This
> > >> > >>>> is
> > >> > >>>>> an extension to FLIP-53[2], where we assumed that RocksDB
> state
> > >> > >> backend
> > >> > >>>> and
> > >> > >>>>> batch operators are the only use cases of managed memory for
> > >> > >> streaming
> > >> > >>>> and
> > >> > >>>>> batch jobs respectively, which is no longer true with the
> > >> > >> introduction
> > >> > >>> of
> > >> > >>>>> Python UDFs.
> > >> > >>>>>
> > >> > >>>>> Please notice that we have not reached consensus between two
> > >> > >> different
> > >> > >>>>> designs. The major part of this FLIP describes one of the
> > >> candidates,
> > >> > >>>> while
> > >> > >>>>> the alternative is discussed in the section "Rejected
> > >> Alternatives".
> > >> > >> We
> > >> > >>>> are
> > >> > >>>>> hoping to borrow intelligence from the community to help us
> > >> resolve
> > >> > >> the
> > >> > >>>>> disagreement.
> > >> > >>>>>
> > >> > >>>>> Any feedback would be appreciated.
> > >> > >>>>>
> > >> > >>>>> Thank you~
> > >> > >>>>>
> > >> > >>>>> Xintong Song
> > >> > >>>>>
> > >> > >>>>>
> > >> > >>>>> [1]
> > >> > >>>>>
> > >> > >>>>>
> > >> > >>>>
> > >> > >>>
> > >> > >>
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> > >> > >>>>>
> > >> > >>>>> [2]
> > >> > >>>>>
> > >> > >>>>>
> > >> > >>>>
> > >> > >>>
> > >> > >>
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > >> > >>>>>
> > >> > >>>>
> > >> > >>>
> > >> > >>
> > >> > >
> > >> >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Yu Li <ca...@gmail.com>.

Thanks for compiling the FLIP Xintong, and +1 for the updated doc.

Just one supplement for the RocksDB state backend part:

It's true that currently we're using managed memory at the slot scope.
However, IMHO, we may support setting weights for different stateful
operators (for advanced usage) in future. For example, users may choose to
set higher weights for join operator over aggregation operator, to give
more memory to those with bigger states. In this case, we may also use
managed memory at the operator scope for state backends. And if I
understand correctly, the current design could cover this case well.

Best Regards,
Yu


On Wed, 2 Sep 2020 at 15:39, Xintong Song <to...@gmail.com> wrote:

> Thanks all for the feedback and discussion.
>
> I have updated the FLIP, with the following changes.
>
>    - Choose the main proposal over the alternative approach
>    - Combine weights of RocksDB and batch operators
>    - Expose weights through configuration options, rather than via
>    ExecutionConfig.
>    - Add implementation plan.
>
> Please help take another look.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <to...@gmail.com> wrote:
>
> > Thanks for the inputs, Aljoscha & Till.
> >
> >
> > # Weight Configuration
> >
> >
> > I think exposing the knobs incrementally is a good idea. However, I'm not
> > sure about non-configurable as the first step.
> >
> >
> > Currently, users can tune memory for rocksdb
> > ('taskmanager.memory.managed.size') and python
> > ('python.fn-execution.[framework|buffer].memory.size') separately, which
> > practically means any combination of rocksdb and python memory sizes. If
> we
> > switch to non-configurable weights, that will be a regression compared to
> > 1.11.
> >
> >
> > Therefore, I think exposing via configuration options might be a good
> > first step. And we can discuss exposing via ExecutionConfig if later we
> see
> > that requirement.
> >
> >
> > # Naming of Weights
> >
> >
> > I'm neutral for "Flink/Internal memory".
> >
> >
> > I think the reason we can combine weights for batch algorithms and state
> > backends is that they are never mixed together. My only concern
> > for "Flink/Internal memory", which might not be a problem at the moment,
> is
> > that what if new memory use cases appear in the future, which can also be
> > described by "Flink/Internal memory" but is not guaranteed not mixed with
> > batch algorithms or state backends?
> >
> >
> > Anyway, I think the naming should not block this FLIP, as long as we have
> > consensus on combining the two weights for rocksdb and batch algorithms.
> We
> > can keep the naming discussion open until the implementation phase.
> >
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <tr...@apache.org>
> > wrote:
> >
> >> Thanks for creating this FLIP Xintong.
> >>
> >> I agree with the previous comments that the memory configuration should
> be
> >> as easy as possible. Every new knob has the potential to confuse users
> >> and/or allows him to shoot himself in the foot. Consequently, I am +1
> for
> >> the first proposal in the FLIP since it is simpler.
> >>
> >> Also +1 for Stephan's proposal to combine batch operator's and
> >> RocksDB's memory usage into one weight.
> >>
> >> Concerning the names for the two weights, I fear that we are facing one
> of
> >> the two hard things in computer science. To add another idea, we could
> >> name
> >> them "Flink memory"/"Internal memory" and "Python memory".
> >>
> >> For the sake of making the scope of the FLIP as small as possible and to
> >> develop the feature incrementally, I think that Aljoscha's proposal to
> >> make
> >> it non-configurable for the first step sounds like a good idea. As a
> next
> >> step (and also if we see need), we can make the memory weights
> >> configurable
> >> via the configuration. And last, we could expose it via the
> >> ExecutionConfig
> >> if it is required.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <al...@apache.org>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > playing devils advocate here: should we even make the memory weights
> >> > configurable? We could go with weights that should make sense for most
> >> > cases in the first version and only introduce configurable weights
> when
> >> > (if) users need them.
> >> >
> >> > Regarding where/how things are configured, I think that most things
> >> > should be a ConfigOption first (Thanks cc'in me, Stephan!). This makes
> >> > them configurable via flink-conf.yaml and via command line parameters,
> >> > for example "bin/flink run -D memory.foo=bla ...". We can think about
> >> > offering programmatic API for cases where it makes sense, of course.
> >> >
> >> > Regarding naming one of the configurable weights
> >> > "StateBackend-BatchAlgorithm". I think it's not a good idea to be that
> >> > specific because the option will not age well. For example when we
> want
> >> > to change which group of memory consumers are configured together or
> >> > when we add something new.
> >> >
> >> > Best,
> >> > Aljoscha
> >> >
> >> > On 31.08.20 08:13, Xintong Song wrote:
> >> > > Thanks for the feedbacks, @Stephan
> >> > >
> >> > >
> >> > >    - There is a push to make as much as possible configurable via
> the
> >> > main
> >> > >> configuration, and not only in code. Specifically values for
> >> operations
> >> > and
> >> > >> tuning.
> >> > >>      I think it would be more important to have such memory weights
> >> in
> >> > the
> >> > >> config, compared to in the program API. /cc Aljoscha
> >> > >
> >> > >
> >> > > I can see the benefit that having memory weights in the main
> >> > configuration
> >> > > makes tuning easier, which makes great sense to me. On the other
> hand,
> >> > what
> >> > > we lose is the flexibility to have different weights for jobs
> running
> >> in
> >> > > the same Flink cluster. It seems to me the problem is that we don't
> >> have
> >> > an
> >> > > easy way to overwrite job-specific configurations without touching
> the
> >> > > codes.
> >> > >
> >> > >
> >> > > Given the current status, what if we make the memory weights
> >> configurable
> >> > > through both the main configuration and the programming API? The
> main
> >> > > configuration should take effect iff the weights are not explicitly
> >> > > specified through the programming API. In this way, job cluster
> users
> >> can
> >> > > easily tune the weight through the main configuration, while session
> >> > > cluster users, if they want to have different weights for jobs, can
> >> still
> >> > > overwrite the weight through execution configs.
> >> > >
> >> > >
> >> > >    - My recommendation would be to keep this as simple as possible.
> >> This
> >> > >> will make a lot of configuration code harder, and make it harder
> for
> >> > users
> >> > >> to understand Flink's memory model.
> >> > >>      Making things as easy for users to understand is very
> important
> >> in
> >> > my
> >> > >> opinion. In that regard, the main proposal in the FLIP seems better
> >> than
> >> > >> the alternative proposal listed at the end of the FLIP page.
> >> > >
> >> > > +1 from my side.
> >> > >
> >> > >
> >> > >    - For the simplicity, we could go even further and simply have
> two
> >> > memory
> >> > >> users at the moment: The operator algorithm/data-structure and the
> >> > external
> >> > >> language process (Python for now).
> >> > >>      We never have batch algos and RocksDB mixed, having this as
> >> > separate
> >> > >> options is confusing as it suggests this can be combined
> >> arbitrarily. I
> >> > >> also think that a slim possibility that we may ever combine this in
> >> the
> >> > >> future is not enough reason to make it more complex/confusing.
> >> > >
> >> > >
> >> > > Good point. +1 for combining batch/rocksdb weights, for they're
> never
> >> > mixed
> >> > > together. We can even just name it "StateBackend-BatchAlgorithm" for
> >> > > explicitly.
> >> > >
> >> > >
> >> > > For "external language process", I'm not entirely sure. Future
> >> external
> >> > > languages are possibly mixed with python processes. To avoid later
> >> > > considering how to share external language memory across different
> >> > > languages, I would suggest to present the concept as "python memory"
> >> > rather
> >> > > than "external language process memory".
> >> > >
> >> > >
> >> > > Thank you~
> >> > >
> >> > > Xintong Song
> >> > >
> >> > >
> >> > >
> >> > > On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org>
> >> wrote:
> >> > >
> >> > >> Thanks for driving this proposal. A few thoughts on the current
> >> design:
> >> > >>
> >> > >>    - There is a push to make as much as possible configurable via
> the
> >> > main
> >> > >> configuration, and not only in code. Specifically values for
> >> operations
> >> > and
> >> > >> tuning.
> >> > >>      I think it would be more important to have such memory weights
> >> in
> >> > the
> >> > >> config, compared to in the program API. /cc Aljoscha
> >> > >>
> >> > >>    - My recommendation would be to keep this as simple as possible.
> >> This
> >> > >> will make a lot of configuration code harder, and make it harder
> for
> >> > users
> >> > >> to understand Flink's memory model.
> >> > >>      Making things as easy for users to understand is very
> important
> >> in
> >> > my
> >> > >> opinion. In that regard, the main proposal in the FLIP seems better
> >> than
> >> > >> the alternative proposal listed at the end of the FLIP page.
> >> > >>
> >> > >>    - For the simplicity, we could go even further and simply have
> two
> >> > memory
> >> > >> users at the moment: The operator algorithm/data-structure and the
> >> > external
> >> > >> language process (Python for now).
> >> > >>      We never have batch algos and RocksDB mixed, having this as
> >> > separate
> >> > >> options is confusing as it suggests this can be combined
> >> arbitrarily. I
> >> > >> also think that a slim possibility that we may ever combine this in
> >> the
> >> > >> future is not enough reason to make it more complex/confusing.
> >> > >>
> >> > >>    - I am also not aware of any plans to combine the network and
> >> > operator
> >> > >> memory. Not that it would be infeasible to do this, but I think
> this
> >> > would
> >> > >> also be orthogonal to this change, and I am not sure this would be
> >> > solved
> >> > >> with static weights. So trying to get network memory into this
> >> proposal
> >> > >> seems pre-mature to me.
> >> > >>
> >> > >> Best,
> >> > >> Stephan
> >> > >>
> >> > >>
> >> > >> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> tonysong820@gmail.com
> >> >
> >> > >> wrote:
> >> > >>
> >> > >>>>
> >> > >>>> A quick question, does network memory treated as managed memory
> >> now?
> >> > Or
> >> > >>> in
> >> > >>>> the future?
> >> > >>>>
> >> > >>> No, network memory is independent from managed memory ATM. And I'm
> >> not
> >> > >>> aware of any plan to combine these two.
> >> > >>>
> >> > >>> Any insights there?
> >> > >>>
> >> > >>> Thank you~
> >> > >>>
> >> > >>> Xintong Song
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <yk...@gmail.com>
> >> wrote:
> >> > >>>
> >> > >>>> A quick question, does network memory treated as managed memory
> >> now?
> >> > Or
> >> > >>> in
> >> > >>>> the future?
> >> > >>>>
> >> > >>>> Best,
> >> > >>>> Kurt
> >> > >>>>
> >> > >>>>
> >> > >>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> >> tonysong820@gmail.com>
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>>> Hi devs,
> >> > >>>>>
> >> > >>>>> I'd like to bring the discussion over FLIP-141[1], which
> proposes
> >> how
> >> > >>>>> managed memory should be shared by various use cases within a
> >> slot.
> >> > >>> This
> >> > >>>> is
> >> > >>>>> an extension to FLIP-53[2], where we assumed that RocksDB state
> >> > >> backend
> >> > >>>> and
> >> > >>>>> batch operators are the only use cases of managed memory for
> >> > >> streaming
> >> > >>>> and
> >> > >>>>> batch jobs respectively, which is no longer true with the
> >> > >> introduction
> >> > >>> of
> >> > >>>>> Python UDFs.
> >> > >>>>>
> >> > >>>>> Please notice that we have not reached consensus between two
> >> > >> different
> >> > >>>>> designs. The major part of this FLIP describes one of the
> >> candidates,
> >> > >>>> while
> >> > >>>>> the alternative is discussed in the section "Rejected
> >> Alternatives".
> >> > >> We
> >> > >>>> are
> >> > >>>>> hoping to borrow intelligence from the community to help us
> >> resolve
> >> > >> the
> >> > >>>>> disagreement.
> >> > >>>>>
> >> > >>>>> Any feedback would be appreciated.
> >> > >>>>>
> >> > >>>>> Thank you~
> >> > >>>>>
> >> > >>>>> Xintong Song
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> [1]
> >> > >>>>>
> >> > >>>>>
> >> > >>>>
> >> > >>>
> >> > >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> >> > >>>>>
> >> > >>>>> [2]
> >> > >>>>>
> >> > >>>>>
> >> > >>>>
> >> > >>>
> >> > >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >> > >>>>>
> >> > >>>>
> >> > >>>
> >> > >>
> >> > >
> >> >
> >> >
> >>
> >
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Xintong Song <to...@gmail.com>.

Thanks all for the feedback and discussion.

I have updated the FLIP, with the following changes.

   - Choose the main proposal over the alternative approach
   - Combine weights of RocksDB and batch operators
   - Expose weights through configuration options, rather than via
   ExecutionConfig.
   - Add implementation plan.

Please help take another look.

Thank you~

Xintong Song



On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <to...@gmail.com> wrote:

> Thanks for the inputs, Aljoscha & Till.
>
>
> # Weight Configuration
>
>
> I think exposing the knobs incrementally is a good idea. However, I'm not
> sure about non-configurable as the first step.
>
>
> Currently, users can tune memory for rocksdb
> ('taskmanager.memory.managed.size') and python
> ('python.fn-execution.[framework|buffer].memory.size') separately, which
> practically means any combination of rocksdb and python memory sizes. If we
> switch to non-configurable weights, that will be a regression compared to
> 1.11.
>
>
> Therefore, I think exposing via configuration options might be a good
> first step. And we can discuss exposing via ExecutionConfig if later we see
> that requirement.
>
>
> # Naming of Weights
>
>
> I'm neutral for "Flink/Internal memory".
>
>
> I think the reason we can combine weights for batch algorithms and state
> backends is that they are never mixed together. My only concern
> for "Flink/Internal memory", which might not be a problem at the moment, is
> that what if new memory use cases appear in the future, which can also be
> described by "Flink/Internal memory" but is not guaranteed not mixed with
> batch algorithms or state backends?
>
>
> Anyway, I think the naming should not block this FLIP, as long as we have
> consensus on combining the two weights for rocksdb and batch algorithms. We
> can keep the naming discussion open until the implementation phase.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <tr...@apache.org>
> wrote:
>
>> Thanks for creating this FLIP Xintong.
>>
>> I agree with the previous comments that the memory configuration should be
>> as easy as possible. Every new knob has the potential to confuse users
>> and/or allows him to shoot himself in the foot. Consequently, I am +1 for
>> the first proposal in the FLIP since it is simpler.
>>
>> Also +1 for Stephan's proposal to combine batch operator's and
>> RocksDB's memory usage into one weight.
>>
>> Concerning the names for the two weights, I fear that we are facing one of
>> the two hard things in computer science. To add another idea, we could
>> name
>> them "Flink memory"/"Internal memory" and "Python memory".
>>
>> For the sake of making the scope of the FLIP as small as possible and to
>> develop the feature incrementally, I think that Aljoscha's proposal to
>> make
>> it non-configurable for the first step sounds like a good idea. As a next
>> step (and also if we see need), we can make the memory weights
>> configurable
>> via the configuration. And last, we could expose it via the
>> ExecutionConfig
>> if it is required.
>>
>> Cheers,
>> Till
>>
>> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <al...@apache.org>
>> wrote:
>>
>> > Hi,
>> >
>> > playing devils advocate here: should we even make the memory weights
>> > configurable? We could go with weights that should make sense for most
>> > cases in the first version and only introduce configurable weights when
>> > (if) users need them.
>> >
>> > Regarding where/how things are configured, I think that most things
>> > should be a ConfigOption first (Thanks cc'in me, Stephan!). This makes
>> > them configurable via flink-conf.yaml and via command line parameters,
>> > for example "bin/flink run -D memory.foo=bla ...". We can think about
>> > offering programmatic API for cases where it makes sense, of course.
>> >
>> > Regarding naming one of the configurable weights
>> > "StateBackend-BatchAlgorithm". I think it's not a good idea to be that
>> > specific because the option will not age well. For example when we want
>> > to change which group of memory consumers are configured together or
>> > when we add something new.
>> >
>> > Best,
>> > Aljoscha
>> >
>> > On 31.08.20 08:13, Xintong Song wrote:
>> > > Thanks for the feedbacks, @Stephan
>> > >
>> > >
>> > >    - There is a push to make as much as possible configurable via the
>> > main
>> > >> configuration, and not only in code. Specifically values for
>> operations
>> > and
>> > >> tuning.
>> > >>      I think it would be more important to have such memory weights
>> in
>> > the
>> > >> config, compared to in the program API. /cc Aljoscha
>> > >
>> > >
>> > > I can see the benefit that having memory weights in the main
>> > configuration
>> > > makes tuning easier, which makes great sense to me. On the other hand,
>> > what
>> > > we lose is the flexibility to have different weights for jobs running
>> in
>> > > the same Flink cluster. It seems to me the problem is that we don't
>> have
>> > an
>> > > easy way to overwrite job-specific configurations without touching the
>> > > codes.
>> > >
>> > >
>> > > Given the current status, what if we make the memory weights
>> configurable
>> > > through both the main configuration and the programming API? The main
>> > > configuration should take effect iff the weights are not explicitly
>> > > specified through the programming API. In this way, job cluster users
>> can
>> > > easily tune the weight through the main configuration, while session
>> > > cluster users, if they want to have different weights for jobs, can
>> still
>> > > overwrite the weight through execution configs.
>> > >
>> > >
>> > >    - My recommendation would be to keep this as simple as possible.
>> This
>> > >> will make a lot of configuration code harder, and make it harder for
>> > users
>> > >> to understand Flink's memory model.
>> > >>      Making things as easy for users to understand is very important
>> in
>> > my
>> > >> opinion. In that regard, the main proposal in the FLIP seems better
>> than
>> > >> the alternative proposal listed at the end of the FLIP page.
>> > >
>> > > +1 from my side.
>> > >
>> > >
>> > >    - For the simplicity, we could go even further and simply have two
>> > memory
>> > >> users at the moment: The operator algorithm/data-structure and the
>> > external
>> > >> language process (Python for now).
>> > >>      We never have batch algos and RocksDB mixed, having this as
>> > separate
>> > >> options is confusing as it suggests this can be combined
>> arbitrarily. I
>> > >> also think that a slim possibility that we may ever combine this in
>> the
>> > >> future is not enough reason to make it more complex/confusing.
>> > >
>> > >
>> > > Good point. +1 for combining batch/rocksdb weights, for they're never
>> > mixed
>> > > together. We can even just name it "StateBackend-BatchAlgorithm" for
>> > > explicitly.
>> > >
>> > >
>> > > For "external language process", I'm not entirely sure. Future
>> external
>> > > languages are possibly mixed with python processes. To avoid later
>> > > considering how to share external language memory across different
>> > > languages, I would suggest to present the concept as "python memory"
>> > rather
>> > > than "external language process memory".
>> > >
>> > >
>> > > Thank you~
>> > >
>> > > Xintong Song
>> > >
>> > >
>> > >
>> > > On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org>
>> wrote:
>> > >
>> > >> Thanks for driving this proposal. A few thoughts on the current
>> design:
>> > >>
>> > >>    - There is a push to make as much as possible configurable via the
>> > main
>> > >> configuration, and not only in code. Specifically values for
>> operations
>> > and
>> > >> tuning.
>> > >>      I think it would be more important to have such memory weights
>> in
>> > the
>> > >> config, compared to in the program API. /cc Aljoscha
>> > >>
>> > >>    - My recommendation would be to keep this as simple as possible.
>> This
>> > >> will make a lot of configuration code harder, and make it harder for
>> > users
>> > >> to understand Flink's memory model.
>> > >>      Making things as easy for users to understand is very important
>> in
>> > my
>> > >> opinion. In that regard, the main proposal in the FLIP seems better
>> than
>> > >> the alternative proposal listed at the end of the FLIP page.
>> > >>
>> > >>    - For the simplicity, we could go even further and simply have two
>> > memory
>> > >> users at the moment: The operator algorithm/data-structure and the
>> > external
>> > >> language process (Python for now).
>> > >>      We never have batch algos and RocksDB mixed, having this as
>> > separate
>> > >> options is confusing as it suggests this can be combined
>> arbitrarily. I
>> > >> also think that a slim possibility that we may ever combine this in
>> the
>> > >> future is not enough reason to make it more complex/confusing.
>> > >>
>> > >>    - I am also not aware of any plans to combine the network and
>> > operator
>> > >> memory. Not that it would be infeasible to do this, but I think this
>> > would
>> > >> also be orthogonal to this change, and I am not sure this would be
>> > solved
>> > >> with static weights. So trying to get network memory into this
>> proposal
>> > >> seems pre-mature to me.
>> > >>
>> > >> Best,
>> > >> Stephan
>> > >>
>> > >>
>> > >> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <tonysong820@gmail.com
>> >
>> > >> wrote:
>> > >>
>> > >>>>
>> > >>>> A quick question, does network memory treated as managed memory
>> now?
>> > Or
>> > >>> in
>> > >>>> the future?
>> > >>>>
>> > >>> No, network memory is independent from managed memory ATM. And I'm
>> not
>> > >>> aware of any plan to combine these two.
>> > >>>
>> > >>> Any insights there?
>> > >>>
>> > >>> Thank you~
>> > >>>
>> > >>> Xintong Song
>> > >>>
>> > >>>
>> > >>>
>> > >>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <yk...@gmail.com>
>> wrote:
>> > >>>
>> > >>>> A quick question, does network memory treated as managed memory
>> now?
>> > Or
>> > >>> in
>> > >>>> the future?
>> > >>>>
>> > >>>> Best,
>> > >>>> Kurt
>> > >>>>
>> > >>>>
>> > >>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
>> tonysong820@gmail.com>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> Hi devs,
>> > >>>>>
>> > >>>>> I'd like to bring the discussion over FLIP-141[1], which proposes
>> how
>> > >>>>> managed memory should be shared by various use cases within a
>> slot.
>> > >>> This
>> > >>>> is
>> > >>>>> an extension to FLIP-53[2], where we assumed that RocksDB state
>> > >> backend
>> > >>>> and
>> > >>>>> batch operators are the only use cases of managed memory for
>> > >> streaming
>> > >>>> and
>> > >>>>> batch jobs respectively, which is no longer true with the
>> > >> introduction
>> > >>> of
>> > >>>>> Python UDFs.
>> > >>>>>
>> > >>>>> Please notice that we have not reached consensus between two
>> > >> different
>> > >>>>> designs. The major part of this FLIP describes one of the
>> candidates,
>> > >>>> while
>> > >>>>> the alternative is discussed in the section "Rejected
>> Alternatives".
>> > >> We
>> > >>>> are
>> > >>>>> hoping to borrow intelligence from the community to help us
>> resolve
>> > >> the
>> > >>>>> disagreement.
>> > >>>>>
>> > >>>>> Any feedback would be appreciated.
>> > >>>>>
>> > >>>>> Thank you~
>> > >>>>>
>> > >>>>> Xintong Song
>> > >>>>>
>> > >>>>>
>> > >>>>> [1]
>> > >>>>>
>> > >>>>>
>> > >>>>
>> > >>>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
>> > >>>>>
>> > >>>>> [2]
>> > >>>>>
>> > >>>>>
>> > >>>>
>> > >>>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>> > >>>>>
>> > >>>>
>> > >>>
>> > >>
>> > >
>> >
>> >
>>
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Xintong Song <to...@gmail.com>.

Thanks for the inputs, Aljoscha & Till.


# Weight Configuration


I think exposing the knobs incrementally is a good idea. However, I'm not
sure about non-configurable as the first step.


Currently, users can tune memory for rocksdb
('taskmanager.memory.managed.size') and python
('python.fn-execution.[framework|buffer].memory.size') separately, which
practically means any combination of rocksdb and python memory sizes. If we
switch to non-configurable weights, that will be a regression compared to
1.11.


Therefore, I think exposing via configuration options might be a good first
step. And we can discuss exposing via ExecutionConfig if later we see that
requirement.


# Naming of Weights


I'm neutral for "Flink/Internal memory".


I think the reason we can combine weights for batch algorithms and state
backends is that they are never mixed together. My only concern
for "Flink/Internal memory", which might not be a problem at the moment, is
that what if new memory use cases appear in the future, which can also be
described by "Flink/Internal memory" but is not guaranteed not mixed with
batch algorithms or state backends?


Anyway, I think the naming should not block this FLIP, as long as we have
consensus on combining the two weights for rocksdb and batch algorithms. We
can keep the naming discussion open until the implementation phase.


Thank you~

Xintong Song



On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <tr...@apache.org> wrote:

> Thanks for creating this FLIP Xintong.
>
> I agree with the previous comments that the memory configuration should be
> as easy as possible. Every new knob has the potential to confuse users
> and/or allows him to shoot himself in the foot. Consequently, I am +1 for
> the first proposal in the FLIP since it is simpler.
>
> Also +1 for Stephan's proposal to combine batch operator's and
> RocksDB's memory usage into one weight.
>
> Concerning the names for the two weights, I fear that we are facing one of
> the two hard things in computer science. To add another idea, we could name
> them "Flink memory"/"Internal memory" and "Python memory".
>
> For the sake of making the scope of the FLIP as small as possible and to
> develop the feature incrementally, I think that Aljoscha's proposal to make
> it non-configurable for the first step sounds like a good idea. As a next
> step (and also if we see need), we can make the memory weights configurable
> via the configuration. And last, we could expose it via the ExecutionConfig
> if it is required.
>
> Cheers,
> Till
>
> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <al...@apache.org>
> wrote:
>
> > Hi,
> >
> > playing devils advocate here: should we even make the memory weights
> > configurable? We could go with weights that should make sense for most
> > cases in the first version and only introduce configurable weights when
> > (if) users need them.
> >
> > Regarding where/how things are configured, I think that most things
> > should be a ConfigOption first (Thanks cc'in me, Stephan!). This makes
> > them configurable via flink-conf.yaml and via command line parameters,
> > for example "bin/flink run -D memory.foo=bla ...". We can think about
> > offering programmatic API for cases where it makes sense, of course.
> >
> > Regarding naming one of the configurable weights
> > "StateBackend-BatchAlgorithm". I think it's not a good idea to be that
> > specific because the option will not age well. For example when we want
> > to change which group of memory consumers are configured together or
> > when we add something new.
> >
> > Best,
> > Aljoscha
> >
> > On 31.08.20 08:13, Xintong Song wrote:
> > > Thanks for the feedbacks, @Stephan
> > >
> > >
> > >    - There is a push to make as much as possible configurable via the
> > main
> > >> configuration, and not only in code. Specifically values for
> operations
> > and
> > >> tuning.
> > >>      I think it would be more important to have such memory weights in
> > the
> > >> config, compared to in the program API. /cc Aljoscha
> > >
> > >
> > > I can see the benefit that having memory weights in the main
> > configuration
> > > makes tuning easier, which makes great sense to me. On the other hand,
> > what
> > > we lose is the flexibility to have different weights for jobs running
> in
> > > the same Flink cluster. It seems to me the problem is that we don't
> have
> > an
> > > easy way to overwrite job-specific configurations without touching the
> > > codes.
> > >
> > >
> > > Given the current status, what if we make the memory weights
> configurable
> > > through both the main configuration and the programming API? The main
> > > configuration should take effect iff the weights are not explicitly
> > > specified through the programming API. In this way, job cluster users
> can
> > > easily tune the weight through the main configuration, while session
> > > cluster users, if they want to have different weights for jobs, can
> still
> > > overwrite the weight through execution configs.
> > >
> > >
> > >    - My recommendation would be to keep this as simple as possible.
> This
> > >> will make a lot of configuration code harder, and make it harder for
> > users
> > >> to understand Flink's memory model.
> > >>      Making things as easy for users to understand is very important
> in
> > my
> > >> opinion. In that regard, the main proposal in the FLIP seems better
> than
> > >> the alternative proposal listed at the end of the FLIP page.
> > >
> > > +1 from my side.
> > >
> > >
> > >    - For the simplicity, we could go even further and simply have two
> > memory
> > >> users at the moment: The operator algorithm/data-structure and the
> > external
> > >> language process (Python for now).
> > >>      We never have batch algos and RocksDB mixed, having this as
> > separate
> > >> options is confusing as it suggests this can be combined arbitrarily.
> I
> > >> also think that a slim possibility that we may ever combine this in
> the
> > >> future is not enough reason to make it more complex/confusing.
> > >
> > >
> > > Good point. +1 for combining batch/rocksdb weights, for they're never
> > mixed
> > > together. We can even just name it "StateBackend-BatchAlgorithm" for
> > > explicitly.
> > >
> > >
> > > For "external language process", I'm not entirely sure. Future external
> > > languages are possibly mixed with python processes. To avoid later
> > > considering how to share external language memory across different
> > > languages, I would suggest to present the concept as "python memory"
> > rather
> > > than "external language process memory".
> > >
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org>
> wrote:
> > >
> > >> Thanks for driving this proposal. A few thoughts on the current
> design:
> > >>
> > >>    - There is a push to make as much as possible configurable via the
> > main
> > >> configuration, and not only in code. Specifically values for
> operations
> > and
> > >> tuning.
> > >>      I think it would be more important to have such memory weights in
> > the
> > >> config, compared to in the program API. /cc Aljoscha
> > >>
> > >>    - My recommendation would be to keep this as simple as possible.
> This
> > >> will make a lot of configuration code harder, and make it harder for
> > users
> > >> to understand Flink's memory model.
> > >>      Making things as easy for users to understand is very important
> in
> > my
> > >> opinion. In that regard, the main proposal in the FLIP seems better
> than
> > >> the alternative proposal listed at the end of the FLIP page.
> > >>
> > >>    - For the simplicity, we could go even further and simply have two
> > memory
> > >> users at the moment: The operator algorithm/data-structure and the
> > external
> > >> language process (Python for now).
> > >>      We never have batch algos and RocksDB mixed, having this as
> > separate
> > >> options is confusing as it suggests this can be combined arbitrarily.
> I
> > >> also think that a slim possibility that we may ever combine this in
> the
> > >> future is not enough reason to make it more complex/confusing.
> > >>
> > >>    - I am also not aware of any plans to combine the network and
> > operator
> > >> memory. Not that it would be infeasible to do this, but I think this
> > would
> > >> also be orthogonal to this change, and I am not sure this would be
> > solved
> > >> with static weights. So trying to get network memory into this
> proposal
> > >> seems pre-mature to me.
> > >>
> > >> Best,
> > >> Stephan
> > >>
> > >>
> > >> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <to...@gmail.com>
> > >> wrote:
> > >>
> > >>>>
> > >>>> A quick question, does network memory treated as managed memory now?
> > Or
> > >>> in
> > >>>> the future?
> > >>>>
> > >>> No, network memory is independent from managed memory ATM. And I'm
> not
> > >>> aware of any plan to combine these two.
> > >>>
> > >>> Any insights there?
> > >>>
> > >>> Thank you~
> > >>>
> > >>> Xintong Song
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <yk...@gmail.com> wrote:
> > >>>
> > >>>> A quick question, does network memory treated as managed memory now?
> > Or
> > >>> in
> > >>>> the future?
> > >>>>
> > >>>> Best,
> > >>>> Kurt
> > >>>>
> > >>>>
> > >>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <tonysong820@gmail.com
> >
> > >>>> wrote:
> > >>>>
> > >>>>> Hi devs,
> > >>>>>
> > >>>>> I'd like to bring the discussion over FLIP-141[1], which proposes
> how
> > >>>>> managed memory should be shared by various use cases within a slot.
> > >>> This
> > >>>> is
> > >>>>> an extension to FLIP-53[2], where we assumed that RocksDB state
> > >> backend
> > >>>> and
> > >>>>> batch operators are the only use cases of managed memory for
> > >> streaming
> > >>>> and
> > >>>>> batch jobs respectively, which is no longer true with the
> > >> introduction
> > >>> of
> > >>>>> Python UDFs.
> > >>>>>
> > >>>>> Please notice that we have not reached consensus between two
> > >> different
> > >>>>> designs. The major part of this FLIP describes one of the
> candidates,
> > >>>> while
> > >>>>> the alternative is discussed in the section "Rejected
> Alternatives".
> > >> We
> > >>>> are
> > >>>>> hoping to borrow intelligence from the community to help us resolve
> > >> the
> > >>>>> disagreement.
> > >>>>>
> > >>>>> Any feedback would be appreciated.
> > >>>>>
> > >>>>> Thank you~
> > >>>>>
> > >>>>> Xintong Song
> > >>>>>
> > >>>>>
> > >>>>> [1]
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> > >>>>>
> > >>>>> [2]
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
> >
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Posted by Till Rohrmann <tr...@apache.org>.

Thanks for creating this FLIP Xintong.

I agree with the previous comments that the memory configuration should be
as easy as possible. Every new knob has the potential to confuse users
and/or allows him to shoot himself in the foot. Consequently, I am +1 for
the first proposal in the FLIP since it is simpler.

Also +1 for Stephan's proposal to combine batch operator's and
RocksDB's memory usage into one weight.

Concerning the names for the two weights, I fear that we are facing one of
the two hard things in computer science. To add another idea, we could name
them "Flink memory"/"Internal memory" and "Python memory".

For the sake of making the scope of the FLIP as small as possible and to
develop the feature incrementally, I think that Aljoscha's proposal to make
it non-configurable for the first step sounds like a good idea. As a next
step (and also if we see need), we can make the memory weights configurable
via the configuration. And last, we could expose it via the ExecutionConfig
if it is required.

Cheers,
Till

On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <al...@apache.org> wrote:

> Hi,
>
> playing devils advocate here: should we even make the memory weights
> configurable? We could go with weights that should make sense for most
> cases in the first version and only introduce configurable weights when
> (if) users need them.
>
> Regarding where/how things are configured, I think that most things
> should be a ConfigOption first (Thanks cc'in me, Stephan!). This makes
> them configurable via flink-conf.yaml and via command line parameters,
> for example "bin/flink run -D memory.foo=bla ...". We can think about
> offering programmatic API for cases where it makes sense, of course.
>
> Regarding naming one of the configurable weights
> "StateBackend-BatchAlgorithm". I think it's not a good idea to be that
> specific because the option will not age well. For example when we want
> to change which group of memory consumers are configured together or
> when we add something new.
>
> Best,
> Aljoscha
>
> On 31.08.20 08:13, Xintong Song wrote:
> > Thanks for the feedbacks, @Stephan
> >
> >
> >    - There is a push to make as much as possible configurable via the
> main
> >> configuration, and not only in code. Specifically values for operations
> and
> >> tuning.
> >>      I think it would be more important to have such memory weights in
> the
> >> config, compared to in the program API. /cc Aljoscha
> >
> >
> > I can see the benefit that having memory weights in the main
> configuration
> > makes tuning easier, which makes great sense to me. On the other hand,
> what
> > we lose is the flexibility to have different weights for jobs running in
> > the same Flink cluster. It seems to me the problem is that we don't have
> an
> > easy way to overwrite job-specific configurations without touching the
> > codes.
> >
> >
> > Given the current status, what if we make the memory weights configurable
> > through both the main configuration and the programming API? The main
> > configuration should take effect iff the weights are not explicitly
> > specified through the programming API. In this way, job cluster users can
> > easily tune the weight through the main configuration, while session
> > cluster users, if they want to have different weights for jobs, can still
> > overwrite the weight through execution configs.
> >
> >
> >    - My recommendation would be to keep this as simple as possible. This
> >> will make a lot of configuration code harder, and make it harder for
> users
> >> to understand Flink's memory model.
> >>      Making things as easy for users to understand is very important in
> my
> >> opinion. In that regard, the main proposal in the FLIP seems better than
> >> the alternative proposal listed at the end of the FLIP page.
> >
> > +1 from my side.
> >
> >
> >    - For the simplicity, we could go even further and simply have two
> memory
> >> users at the moment: The operator algorithm/data-structure and the
> external
> >> language process (Python for now).
> >>      We never have batch algos and RocksDB mixed, having this as
> separate
> >> options is confusing as it suggests this can be combined arbitrarily. I
> >> also think that a slim possibility that we may ever combine this in the
> >> future is not enough reason to make it more complex/confusing.
> >
> >
> > Good point. +1 for combining batch/rocksdb weights, for they're never
> mixed
> > together. We can even just name it "StateBackend-BatchAlgorithm" for
> > explicitly.
> >
> >
> > For "external language process", I'm not entirely sure. Future external
> > languages are possibly mixed with python processes. To avoid later
> > considering how to share external language memory across different
> > languages, I would suggest to present the concept as "python memory"
> rather
> > than "external language process memory".
> >
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org> wrote:
> >
> >> Thanks for driving this proposal. A few thoughts on the current design:
> >>
> >>    - There is a push to make as much as possible configurable via the
> main
> >> configuration, and not only in code. Specifically values for operations
> and
> >> tuning.
> >>      I think it would be more important to have such memory weights in
> the
> >> config, compared to in the program API. /cc Aljoscha
> >>
> >>    - My recommendation would be to keep this as simple as possible. This
> >> will make a lot of configuration code harder, and make it harder for
> users
> >> to understand Flink's memory model.
> >>      Making things as easy for users to understand is very important in
> my
> >> opinion. In that regard, the main proposal in the FLIP seems better than
> >> the alternative proposal listed at the end of the FLIP page.
> >>
> >>    - For the simplicity, we could go even further and simply have two
> memory
> >> users at the moment: The operator algorithm/data-structure and the
> external
> >> language process (Python for now).
> >>      We never have batch algos and RocksDB mixed, having this as
> separate
> >> options is confusing as it suggests this can be combined arbitrarily. I
> >> also think that a slim possibility that we may ever combine this in the
> >> future is not enough reason to make it more complex/confusing.
> >>
> >>    - I am also not aware of any plans to combine the network and
> operator
> >> memory. Not that it would be infeasible to do this, but I think this
> would
> >> also be orthogonal to this change, and I am not sure this would be
> solved
> >> with static weights. So trying to get network memory into this proposal
> >> seems pre-mature to me.
> >>
> >> Best,
> >> Stephan
> >>
> >>
> >> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <to...@gmail.com>
> >> wrote:
> >>
> >>>>
> >>>> A quick question, does network memory treated as managed memory now?
> Or
> >>> in
> >>>> the future?
> >>>>
> >>> No, network memory is independent from managed memory ATM. And I'm not
> >>> aware of any plan to combine these two.
> >>>
> >>> Any insights there?
> >>>
> >>> Thank you~
> >>>
> >>> Xintong Song
> >>>
> >>>
> >>>
> >>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <yk...@gmail.com> wrote:
> >>>
> >>>> A quick question, does network memory treated as managed memory now?
> Or
> >>> in
> >>>> the future?
> >>>>
> >>>> Best,
> >>>> Kurt
> >>>>
> >>>>
> >>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <to...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi devs,
> >>>>>
> >>>>> I'd like to bring the discussion over FLIP-141[1], which proposes how
> >>>>> managed memory should be shared by various use cases within a slot.
> >>> This
> >>>> is
> >>>>> an extension to FLIP-53[2], where we assumed that RocksDB state
> >> backend
> >>>> and
> >>>>> batch operators are the only use cases of managed memory for
> >> streaming
> >>>> and
> >>>>> batch jobs respectively, which is no longer true with the
> >> introduction
> >>> of
> >>>>> Python UDFs.
> >>>>>
> >>>>> Please notice that we have not reached consensus between two
> >> different
> >>>>> designs. The major part of this FLIP describes one of the candidates,
> >>>> while
> >>>>> the alternative is discussed in the section "Rejected Alternatives".
> >> We
> >>>> are
> >>>>> hoping to borrow intelligence from the community to help us resolve
> >> the
> >>>>> disagreement.
> >>>>>
> >>>>> Any feedback would be appreciated.
> >>>>>
> >>>>> Thank you~
> >>>>>
> >>>>> Xintong Song
> >>>>>
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> >>>>>
> >>>>> [2]
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >>>>>
> >>>>
> >>>
> >>
> >
>
>