You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Becket Qin <be...@gmail.com> on 2019/05/06 02:08:52 UTC

Re: [DISCUSS] FLIP-36 - Support Interactive Programming in Flink Table API

Hi Flink devs,

We have gone through some more discussion over the design proposed in the
last email and made some further modification to the design of the default
intermediate result storage. I have just updated the wiki page of FLIP-36
to reflect the latest design.

https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink#FLIP-36:SupportInteractiveProgramminginFlink-ImplementationDetails

To summarize briefly, the default intermediate result storage relies on the
network stack to store the intermediate results and maintains all the
intermediate result metadata on the client side. We avoided introducing
additional services in runtime but tried to integrate the design with
existing components as much as possible.

Looking forward to your feedback.

Thanks,

Jiangjie (Becket) Qin



On Wed, Apr 10, 2019 at 9:36 PM Becket Qin <be...@gmail.com> wrote:

> Hi folks,
>
> Just want to revive this discussion thread. A few of us had some offline
> discussions around the implementation details of this FLIP.
>
> Here I briefly summarize the offline discussion:
>
> --
> Some concerns were raised to the default implementation of cache service.
> 1. The default cache service introduces a separate service in Flink
> runtime, which seems complicated, especially when things like colocation is
> needed.
> 2. Using the Flink job to run default cache service may expose unnecessary
> implementation details to the users. (e.g. it may take some slot and
> resource, etc).
> 3. Sharing of the persistent shuffle in the network stack may need
> additional work in runtime.
>
> In the interest of addressing the above concerns. We would like to make
> some changes to the current FLIP proposal.
>
> In general we agreed that our primary goal is to unify the storage tier of
> default shuffle service and default intermediate result storage.
>
> Stephan gave some valuable suggestions on how to improve the current FLIP
> design and to align with the efforts of FLIP-31. Some highlights are:
>   1. Unify the storage tier of default shuffle service and default
> intermediate result storage to network stack.
>   2. We need both internal (default) and external services for Shuffle and
> Intermediate Result. The internal (default) implementation is for
> out-of-box user experience. The external service is for more sophisticated
> use cases.
>   3. Having two interfaces *ShuffleService *and *IntermediateResultStorage
> (for explicit cache handling). *The internal default network-stack-based
> solution implement both interfaces.
> --
>
> As a result of these discussions, we would like to add a few more things
> to the current FLIP-36. More specifically:
> 1. A pluggable IntermediateResultStorage interface (for explicit cache
> handling).
> 2. A mechanism to enable intermediate results (persisted shuffle and
> explicit cache) reference across jobs.
> 3. A stack to manage intermediate result metadata (persisted shuffle and
> explicit cache) in runtime.
>
> The detail design is explained in the following doc. The doc is mostly
> about the implementation of default intermediate result storage. API wise,
> it is an addition to the existing Table API change proposed in FLIP.
>
>
> https://docs.google.com/document/d/17twjcQn70rJnVCXcr74AL44HY3jLeT1leC9rAFsluFg/edit#
>
> I'll update FLIP-36 wiki to reflect the new proposal. But we can probably
> use the Google Doc for discussion right now while I am updating the FLIP
> wiki.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Thu, Mar 14, 2019 at 9:28 PM Becket Qin <be...@gmail.com> wrote:
>
>> Thanks Piotr, for the +1 and all the patient discussion :)
>>
>> On Wed, Mar 13, 2019 at 3:53 PM Piotr Nowojski <pi...@ververica.com>
>> wrote:
>>
>>> Hi Becket,
>>>
>>> Thank you for driving the effort and writing down the detailed proposal.
>>> To me this FLIP looks good and it has +1 from me.
>>>
>>> Piotr Nowojski
>>>
>>> > On 12 Mar 2019, at 13:21, Becket Qin <be...@gmail.com> wrote:
>>> >
>>> > Hi folks,
>>> >
>>> > We would like to start the discussion thread about FLIP-36 support
>>> > interactive programming in Flink Table API.
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
>>> >
>>> > There has been an extended discussion[1] in the mailing list. To quick
>>> > recap, we propose to add capability of caching intermediate results in
>>> user
>>> > applications for later usage.
>>> >
>>> > Feedback and comments are welcome!
>>> >
>>> > Thanks,
>>> >
>>> > Jiangjie (Becket) Qin
>>> >
>>> > [1]
>>> >
>>> http://mail-archives.apache.org/mod_mbox/flink-dev/201811.mbox/%3CCABtAgwERNR8otaMdT4f-mFZR5s956K530+NXt2s7iEH4i4gd7g@mail.gmail.com%3E
>>>
>>>

Re: [DISCUSS] FLIP-36 - Support Interactive Programming in Flink Table API

Posted by Stephan Ewen <se...@apache.org>.
The FLIP looks good and is quite details, thanks!
I think we should proceed to start to vote whether to accept this FLIP.

If the feature and design are accepted, the next step would be to have an
implementation breakdown.

Best,
Stephan



On Mon, May 6, 2019 at 4:18 AM Becket Qin <be...@gmail.com> wrote:

> Hi Flink devs,
>
> We have gone through some more discussion over the design proposed in the
> last email and made some further modification to the design of the default
> intermediate result storage. I have just updated the wiki page of FLIP-36
> to reflect the latest design.
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink#FLIP-36:SupportInteractiveProgramminginFlink-ImplementationDetails
>
> To summarize briefly, the default intermediate result storage relies on the
> network stack to store the intermediate results and maintains all the
> intermediate result metadata on the client side. We avoided introducing
> additional services in runtime but tried to integrate the design with
> existing components as much as possible.
>
> Looking forward to your feedback.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Wed, Apr 10, 2019 at 9:36 PM Becket Qin <be...@gmail.com> wrote:
>
> > Hi folks,
> >
> > Just want to revive this discussion thread. A few of us had some offline
> > discussions around the implementation details of this FLIP.
> >
> > Here I briefly summarize the offline discussion:
> >
> > --
> > Some concerns were raised to the default implementation of cache service.
> > 1. The default cache service introduces a separate service in Flink
> > runtime, which seems complicated, especially when things like colocation
> is
> > needed.
> > 2. Using the Flink job to run default cache service may expose
> unnecessary
> > implementation details to the users. (e.g. it may take some slot and
> > resource, etc).
> > 3. Sharing of the persistent shuffle in the network stack may need
> > additional work in runtime.
> >
> > In the interest of addressing the above concerns. We would like to make
> > some changes to the current FLIP proposal.
> >
> > In general we agreed that our primary goal is to unify the storage tier
> of
> > default shuffle service and default intermediate result storage.
> >
> > Stephan gave some valuable suggestions on how to improve the current FLIP
> > design and to align with the efforts of FLIP-31. Some highlights are:
> >   1. Unify the storage tier of default shuffle service and default
> > intermediate result storage to network stack.
> >   2. We need both internal (default) and external services for Shuffle
> and
> > Intermediate Result. The internal (default) implementation is for
> > out-of-box user experience. The external service is for more
> sophisticated
> > use cases.
> >   3. Having two interfaces *ShuffleService *and
> *IntermediateResultStorage
> > (for explicit cache handling). *The internal default network-stack-based
> > solution implement both interfaces.
> > --
> >
> > As a result of these discussions, we would like to add a few more things
> > to the current FLIP-36. More specifically:
> > 1. A pluggable IntermediateResultStorage interface (for explicit cache
> > handling).
> > 2. A mechanism to enable intermediate results (persisted shuffle and
> > explicit cache) reference across jobs.
> > 3. A stack to manage intermediate result metadata (persisted shuffle and
> > explicit cache) in runtime.
> >
> > The detail design is explained in the following doc. The doc is mostly
> > about the implementation of default intermediate result storage. API
> wise,
> > it is an addition to the existing Table API change proposed in FLIP.
> >
> >
> >
> https://docs.google.com/document/d/17twjcQn70rJnVCXcr74AL44HY3jLeT1leC9rAFsluFg/edit#
> >
> > I'll update FLIP-36 wiki to reflect the new proposal. But we can probably
> > use the Google Doc for discussion right now while I am updating the FLIP
> > wiki.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Thu, Mar 14, 2019 at 9:28 PM Becket Qin <be...@gmail.com> wrote:
> >
> >> Thanks Piotr, for the +1 and all the patient discussion :)
> >>
> >> On Wed, Mar 13, 2019 at 3:53 PM Piotr Nowojski <pi...@ververica.com>
> >> wrote:
> >>
> >>> Hi Becket,
> >>>
> >>> Thank you for driving the effort and writing down the detailed
> proposal.
> >>> To me this FLIP looks good and it has +1 from me.
> >>>
> >>> Piotr Nowojski
> >>>
> >>> > On 12 Mar 2019, at 13:21, Becket Qin <be...@gmail.com> wrote:
> >>> >
> >>> > Hi folks,
> >>> >
> >>> > We would like to start the discussion thread about FLIP-36 support
> >>> > interactive programming in Flink Table API.
> >>> >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> >>> >
> >>> > There has been an extended discussion[1] in the mailing list. To
> quick
> >>> > recap, we propose to add capability of caching intermediate results
> in
> >>> user
> >>> > applications for later usage.
> >>> >
> >>> > Feedback and comments are welcome!
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Jiangjie (Becket) Qin
> >>> >
> >>> > [1]
> >>> >
> >>>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201811.mbox/%3CCABtAgwERNR8otaMdT4f-mFZR5s956K530+NXt2s7iEH4i4gd7g@mail.gmail.com%3E
> >>>
> >>>
>