You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Aljoscha Krettek <al...@apache.org> on 2016/11/02 09:16:11 UTC

Re: [DISCUSS] FLIP-2 Extending Window Function Metadata

I think we reached consensus here so I would like to mark this FLIP as
accepted. We will now process with implementing the first step, i.e. adding
the new ProcessWindowFunction.

On Mon, 1 Aug 2016 at 18:08 Aljoscha Krettek <al...@apache.org> wrote:

> Alright, that seems reasonable. I updated the doc to add the Collector to
> the method signature again.
>
> On Mon, 1 Aug 2016 at 00:59 Stephan Ewen <se...@apache.org> wrote:
>
> The Collector is pretty integral in all the other functions that return
> multiple elements. I honestly don't see us switching away from it, given
> that it is such a core part of the API.
>
> The close() method has, to my best knowledge, not caused issues, yet. I
> cannot recall anyone mentioning that the close() method confused them, they
> accidentally called it, etc.
> I am wondering whether this is more of a theoretical than practical issue.
>
> If we move one function away from Collector to be on the safe side for a
> "maybe change in the future", while keeping Collector in all other
> functions - I think that fragments the API concept wise more than it
> improves anything.
>
> On Sun, Jul 31, 2016 at 7:10 PM, Aljoscha Krettek <al...@apache.org>
> wrote:
>
> > @Stephan: For the Output, should we keep using a Collector (which
> exposes)
> > the close() method which should never be called by users or create a new
> > Output type that only has an "output" method. Collector can also be used
> > but with a close() method that doesn't do anything. In the long run, I
> > thought it might be better to switch the type away from Collector.
> >
> > Cheers,
> > Aljoscha
> >
> > On Wed, 20 Jul 2016 at 01:25 Maximilian Michels <mx...@apache.org> wrote:
> >
> > > I think it looks like Beam rather than Hadoop :)
> > >
> > > What Stephan meant was that he wanted a dedicated output method in the
> > > ProcessWindowFunction. I agree with Aljoscha that we shouldn't expose
> > > the collector.
> > >
> > > On Tue, Jul 19, 2016 at 10:45 PM, Aljoscha Krettek <
> aljoscha@apache.org>
> > > wrote:
> > > > You mean keep the Collector? I don't like that one because it has the
> > > > close() method that should never be called by the user.
> > > >
> > > > We can keep it, though, because all the other user function
> interfaces
> > > also
> > > > expose it to the user.
> > > >
> > > > On Tue, 19 Jul 2016 at 15:22 Stephan Ewen <se...@apache.org> wrote:
> > > >
> > > >> I would actually make the output a separate parameter as well.
> Pretty
> > > much
> > > >> like the old variant, only replacing the "Window" parameter by the
> > > context
> > > >> (which contains everything about the window).
> > > >> It could also be called "WindowInvocationContext" or so.
> > > >>
> > > >> The current variant looks too Hadoop to me ;-) Everything done on
> the
> > > >> context object, and messy mocking when creating tests.
> > > >>
> > > >> On Mon, Jul 18, 2016 at 6:42 PM, Radu Tudoran <
> > radu.tudoran@huawei.com>
> > > >> wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > Sorry - I made a mistake - I was thinking of getting access to the
> > > >> > collection (mist-read :) collector) of events in the window buffer
> > in
> > > >> > order to be able to delete/evict some of them which are not
> > necessary
> > > the
> > > >> > last ones.
> > > >> >
> > > >> >
> > > >> > Radu
> > > >> >
> > > >> > -----Original Message-----
> > > >> > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > >> > Sent: Monday, July 18, 2016 5:54 PM
> > > >> > To: dev@flink.apache.org
> > > >> > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function Metadata
> > > >> >
> > > >> > What about the collector? This is only used for emitting elements
> to
> > > the
> > > >> > downstream operation.
> > > >> >
> > > >> > On Mon, 18 Jul 2016 at 17:52 Radu Tudoran <
> radu.tudoran@huawei.com>
> > > >> wrote:
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > > I think it looks good and most importantly is that we can extend
> > it
> > > in
> > > >> > > the directions discussed so far.
> > > >> > >
> > > >> > > One question though regarding the Collector - are we going to be
> > > able
> > > >> > > to delete random elements from the list if this is not exposed
> as
> > a
> > > >> > > collection, at least to the evictor? If not, how are we going to
> > > >> > > extend in the future to cover this case?
> > > >> > >
> > > >> > > Regarding the ordering - I also observed that there are
> situations
> > > >> > > where elements do not have a logical order. One example is if
> you
> > > have
> > > >> > > high rates of the events. Nevertheless, even if now is not the
> > time
> > > >> > > for this, I think in the future we can imagine having also some
> > data
> > > >> > > structures that offer some ordering. It can save some
> computation
> > > >> > > efforts later in the functions for some use cases.
> > > >> > >
> > > >> > >
> > > >> > > Best regards,
> > > >> > >
> > > >> > >
> > > >> > > -----Original Message-----
> > > >> > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > >> > > Sent: Monday, July 18, 2016 3:45 PM
> > > >> > > To: dev@flink.apache.org
> > > >> > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function Metadata
> > > >> > >
> > > >> > > I incorporated the changes. The proposed interface of
> > > >> > > ProcessWindowFunction is now this:
> > > >> > >
> > > >> > > public abstract class ProcessWindowFunction <IN, OUT, KEY, W
> > extends
> > > >> > > Window> implements Function {
> > > >> > >
> > > >> > >     public abstract void process(KEY key, Iterable<IN> elements,
> > > >> > > Context
> > > >> > > ctx) throws Exception;
> > > >> > >
> > > >> > >     public abstract class Context {
> > > >> > >         public abstract W window();
> > > >> > >         public abstract void output(OUT value);
> > > >> > >     }
> > > >> > > }
> > > >> > >
> > > >> > > I'm proposing to not expose Collector anymore because it has the
> > > >> > > close() method that should not be called by users. Having the
> > > output()
> > > >> > > call directly on the context should work just as well.
> > > >> > >
> > > >> > > Also, I marked the "adding a firing reason" and "adding firing
> > > >> > > counter" as future work that are only examples of stuff that can
> > be
> > > >> > > implemented on top of the new interface. Initially, this will
> > > provide
> > > >> > > exactly the same features as the old API but be extensible. I
> did
> > > this
> > > >> > > to not make the scope of this proposal to big because Radu also
> > > >> > > suggested more changes and each of them should be covered in a
> > > separate
> > > >> > design doc or FLIP.
> > > >> > >
> > > >> > > @Radu: On the different buffer types. I think this would be very
> > > >> tricky.
> > > >> > > Right now, people should also not rely on the fact that elements
> > are
> > > >> > > "FIFO". Some state backends might keep the elements in a
> different
> > > >> > > order and when you have merging windows/session windows the
> order
> > of
> > > >> > > the elements will also not be preserved.
> > > >> > >
> > > >> > > Cheers,
> > > >> > > Aljoscha
> > > >> > >
> > > >> > > On Wed, 13 Jul 2016 at 18:40 Radu Tudoran <
> > radu.tudoran@huawei.com>
> > > >> > wrote:
> > > >> > >
> > > >> > > > Hi,
> > > >> > > >
> > > >> > > > If it is to extend the Context to pass more information
> between
> > > the
> > > >> > > > stages of processing a window (triggering -> process ->
> > eviction),
> > > >> > > > why not adding also a "EvictionInfo"? I think this might
> > actually
> > > >> > > > help with the issues discussed in the tread related to the
> > > eviction
> > > >> > policy.
> > > >> > > > I could imagine using this parameter to pass the conditions,
> > from
> > > >> > > > the processing stage to the evictor, about what events to be
> > > >> > eliminated.
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > public abstract class Context {
> > > >> > > >
> > > >> > > >    public abstract EvictionInfo evictionInfo();
> > > >> > > >
> > > >> > > > ...
> > > >> > > >
> > > >> > > >
> > > >> > > >    public abstract KEY key();
> > > >> > > >
> > > >> > > >    public abstract W window();
> > > >> > > >
> > > >> > > >    public abstract int id();
> > > >> > > >
> > > >> > > >    public abstract FiringInfo firingInfo();
> > > >> > > >
> > > >> > > >    public abstract Iterable<IN> elements();
> > > >> > > >
> > > >> > > >    public abstract void output(OUT value);
> > > >> > > >
> > > >> > > > }
> > > >> > > >
> > > >> > > >
> > > >> > > > Also on a slightly unrelated issue - how hard it would be to
> > > >> > > > introduce different types of buffers for the windows.
> Currently
> > > the
> > > >> > > > existing one is behaving (when under processing) similar with
> a
> > > FIFO
> > > >> > > > queue (in the sense that you need to start from beginning,
> from
> > > the
> > > >> > oldest element).
> > > >> > > > How about enabling for example also LIFO behavior (start
> > iterating
> > > >> > > > through the list from the most recent element). As in the
> source
> > > >> > > > queues or stacks are not actually used, perhaps we can just
> pass
> > > >> > > > policies to the iterator - or have custom itrators
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > Dr. Radu Tudoran
> > > >> > > > Research Engineer - Big Data Expert
> > > >> > > > IT R&D Division
> > > >> > > >
> > > >> > > >
> > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH European Research Center
> > > >> > > > Riesstrasse 25, 80992 München
> > > >> > > >
> > > >> > > > E-mail: radu.tudoran@huawei.com
> > > >> > > > Mobile: +49 15209084330 <01520%209084330>
> > > >> > > > Telephone: +49 891588344173 <089%201588344173>
> > > >> > > >
> > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH Hansaallee 205, 40549
> > > >> > > > Düsseldorf, Germany, www.huawei.com Registered
> > > >> > > > Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
> > Managing
> > > >> > > > Director: Bo PENG, Wanzhou MENG, Lifang CHEN Sitz der
> > > Gesellschaft:
> > > >> > > > Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
> > > >> > > > Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN This
> e-mail
> > > and
> > > >> > > > its attachments contain confidential information from HUAWEI,
> > > which
> > > >> > > > is intended only for the person or entity whose address is
> > listed
> > > >> > above.
> > > >> > > > Any use of the information contained herein in any way
> > (including,
> > > >> > > > but not limited to, total or partial disclosure, reproduction,
> > or
> > > >> > > > dissemination) by persons other than the intended recipient(s)
> > is
> > > >> > > > prohibited. If you receive this e-mail in error, please notify
> > the
> > > >> > > > sender by phone or email immediately and delete it!
> > > >> > > >
> > > >> > > >
> > > >> > > > -----Original Message-----
> > > >> > > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > >> > > > Sent: Wednesday, July 13, 2016 2:24 PM
> > > >> > > > To: dev@flink.apache.org
> > > >> > > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function
> Metadata
> > > >> > > >
> > > >> > > > Sure, I also thought about this but went for the "extreme"
> > > initially.
> > > >> > > > If no-one objects I'll update the doc in a bit.
> > > >> > > >
> > > >> > > > On Wed, 13 Jul 2016 at 14:17 Stephan Ewen <se...@apache.org>
> > > wrote:
> > > >> > > >
> > > >> > > > > Thanks for opening this.
> > > >> > > > >
> > > >> > > > > I see the need for having an extensible context object for
> > > window
> > > >> > > > > function invocations, but i think hiding every parameter in
> > the
> > > >> > > > > context is a bit unnatural.
> > > >> > > > >
> > > >> > > > > How about having a function "apply(Key, Values,
> WindowContext,
> > > >> > > > Collector)"
> > > >> > > > > ?
> > > >> > > > > It should be possible to write the straightforward use cases
> > > >> > > > > without accessing the context object.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Wed, Jul 13, 2016 at 1:56 PM, Aljoscha Krettek
> > > >> > > > > <al...@apache.org>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi,
> > > >> > > > > > this is a proposal to introduce a new interface for the
> > window
> > > >> > > > > > function
> > > >> > > > > to
> > > >> > > > > > make it more extensible for the future where we might want
> > to
> > > >> > > > > > provide additional information about why a window fired to
> > the
> > > >> > > > > > user
> > > >> > > > function:
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending
> > > >> > > > > +W
> > > >> > > > > in
> > > >> > > > > dow+Function+Metadata
> > > >> > > > > >
> > > >> > > > > > I'd appreciate your thoughts!
> > > >> > > > > >
> > > >> > > > > > Cheers,
> > > >> > > > > > Aljoscha
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>
>

Re: [DISCUSS] FLIP-2 Extending Window Function Metadata

Posted by Manu Zhang <ow...@gmail.com>.
No problem, Aljoscha. I'll follow up with my use cases.

Thanks,
Manu

On Wed, Nov 2, 2016 at 9:15 PM Aljoscha Krettek <al...@apache.org> wrote:

> Hi Manu,
> it's great that you want to work on this but another contributor has also
> started looking into this and has some code already. It's unfortunate that
> there is no Jira issue yet for this but I think he'll open one this week.
>
> Cheers,
> Aljoscha
>
> On Wed, 2 Nov 2016 at 13:00 Manu Zhang <ow...@gmail.com> wrote:
>
> > Hi Aljoscha,
> >
> > Have you started working on ProcessWindowFunction ? If not, may I take
> this
> > task ?
> >
> > Thanks,
> > Manu
> >
> >
> > On Wed, Nov 2, 2016 at 5:16 PM Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> > > I think we reached consensus here so I would like to mark this FLIP as
> > > accepted. We will now process with implementing the first step, i.e.
> > adding
> > > the new ProcessWindowFunction.
> > >
> > > On Mon, 1 Aug 2016 at 18:08 Aljoscha Krettek <al...@apache.org>
> > wrote:
> > >
> > > > Alright, that seems reasonable. I updated the doc to add the
> Collector
> > to
> > > > the method signature again.
> > > >
> > > > On Mon, 1 Aug 2016 at 00:59 Stephan Ewen <se...@apache.org> wrote:
> > > >
> > > > The Collector is pretty integral in all the other functions that
> return
> > > > multiple elements. I honestly don't see us switching away from it,
> > given
> > > > that it is such a core part of the API.
> > > >
> > > > The close() method has, to my best knowledge, not caused issues,
> yet. I
> > > > cannot recall anyone mentioning that the close() method confused
> them,
> > > they
> > > > accidentally called it, etc.
> > > > I am wondering whether this is more of a theoretical than practical
> > > issue.
> > > >
> > > > If we move one function away from Collector to be on the safe side
> for
> > a
> > > > "maybe change in the future", while keeping Collector in all other
> > > > functions - I think that fragments the API concept wise more than it
> > > > improves anything.
> > > >
> > > > On Sun, Jul 31, 2016 at 7:10 PM, Aljoscha Krettek <
> aljoscha@apache.org
> > >
> > > > wrote:
> > > >
> > > > > @Stephan: For the Output, should we keep using a Collector (which
> > > > exposes)
> > > > > the close() method which should never be called by users or create
> a
> > > new
> > > > > Output type that only has an "output" method. Collector can also be
> > > used
> > > > > but with a close() method that doesn't do anything. In the long
> run,
> > I
> > > > > thought it might be better to switch the type away from Collector.
> > > > >
> > > > > Cheers,
> > > > > Aljoscha
> > > > >
> > > > > On Wed, 20 Jul 2016 at 01:25 Maximilian Michels <mx...@apache.org>
> > > wrote:
> > > > >
> > > > > > I think it looks like Beam rather than Hadoop :)
> > > > > >
> > > > > > What Stephan meant was that he wanted a dedicated output method
> in
> > > the
> > > > > > ProcessWindowFunction. I agree with Aljoscha that we shouldn't
> > expose
> > > > > > the collector.
> > > > > >
> > > > > > On Tue, Jul 19, 2016 at 10:45 PM, Aljoscha Krettek <
> > > > aljoscha@apache.org>
> > > > > > wrote:
> > > > > > > You mean keep the Collector? I don't like that one because it
> has
> > > the
> > > > > > > close() method that should never be called by the user.
> > > > > > >
> > > > > > > We can keep it, though, because all the other user function
> > > > interfaces
> > > > > > also
> > > > > > > expose it to the user.
> > > > > > >
> > > > > > > On Tue, 19 Jul 2016 at 15:22 Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > >> I would actually make the output a separate parameter as well.
> > > > Pretty
> > > > > > much
> > > > > > >> like the old variant, only replacing the "Window" parameter by
> > the
> > > > > > context
> > > > > > >> (which contains everything about the window).
> > > > > > >> It could also be called "WindowInvocationContext" or so.
> > > > > > >>
> > > > > > >> The current variant looks too Hadoop to me ;-) Everything done
> > on
> > > > the
> > > > > > >> context object, and messy mocking when creating tests.
> > > > > > >>
> > > > > > >> On Mon, Jul 18, 2016 at 6:42 PM, Radu Tudoran <
> > > > > radu.tudoran@huawei.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi,
> > > > > > >> >
> > > > > > >> > Sorry - I made a mistake - I was thinking of getting access
> to
> > > the
> > > > > > >> > collection (mist-read :) collector) of events in the window
> > > buffer
> > > > > in
> > > > > > >> > order to be able to delete/evict some of them which are not
> > > > > necessary
> > > > > > the
> > > > > > >> > last ones.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Radu
> > > > > > >> >
> > > > > > >> > -----Original Message-----
> > > > > > >> > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > > > >> > Sent: Monday, July 18, 2016 5:54 PM
> > > > > > >> > To: dev@flink.apache.org
> > > > > > >> > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function
> > Metadata
> > > > > > >> >
> > > > > > >> > What about the collector? This is only used for emitting
> > > elements
> > > > to
> > > > > > the
> > > > > > >> > downstream operation.
> > > > > > >> >
> > > > > > >> > On Mon, 18 Jul 2016 at 17:52 Radu Tudoran <
> > > > radu.tudoran@huawei.com>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > > Hi,
> > > > > > >> > >
> > > > > > >> > > I think it looks good and most importantly is that we can
> > > extend
> > > > > it
> > > > > > in
> > > > > > >> > > the directions discussed so far.
> > > > > > >> > >
> > > > > > >> > > One question though regarding the Collector - are we going
> > to
> > > be
> > > > > > able
> > > > > > >> > > to delete random elements from the list if this is not
> > exposed
> > > > as
> > > > > a
> > > > > > >> > > collection, at least to the evictor? If not, how are we
> > going
> > > to
> > > > > > >> > > extend in the future to cover this case?
> > > > > > >> > >
> > > > > > >> > > Regarding the ordering - I also observed that there are
> > > > situations
> > > > > > >> > > where elements do not have a logical order. One example is
> > if
> > > > you
> > > > > > have
> > > > > > >> > > high rates of the events. Nevertheless, even if now is not
> > the
> > > > > time
> > > > > > >> > > for this, I think in the future we can imagine having also
> > > some
> > > > > data
> > > > > > >> > > structures that offer some ordering. It can save some
> > > > computation
> > > > > > >> > > efforts later in the functions for some use cases.
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > Best regards,
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > -----Original Message-----
> > > > > > >> > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > > > >> > > Sent: Monday, July 18, 2016 3:45 PM
> > > > > > >> > > To: dev@flink.apache.org
> > > > > > >> > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function
> > > Metadata
> > > > > > >> > >
> > > > > > >> > > I incorporated the changes. The proposed interface of
> > > > > > >> > > ProcessWindowFunction is now this:
> > > > > > >> > >
> > > > > > >> > > public abstract class ProcessWindowFunction <IN, OUT,
> KEY, W
> > > > > extends
> > > > > > >> > > Window> implements Function {
> > > > > > >> > >
> > > > > > >> > >     public abstract void process(KEY key, Iterable<IN>
> > > elements,
> > > > > > >> > > Context
> > > > > > >> > > ctx) throws Exception;
> > > > > > >> > >
> > > > > > >> > >     public abstract class Context {
> > > > > > >> > >         public abstract W window();
> > > > > > >> > >         public abstract void output(OUT value);
> > > > > > >> > >     }
> > > > > > >> > > }
> > > > > > >> > >
> > > > > > >> > > I'm proposing to not expose Collector anymore because it
> has
> > > the
> > > > > > >> > > close() method that should not be called by users. Having
> > the
> > > > > > output()
> > > > > > >> > > call directly on the context should work just as well.
> > > > > > >> > >
> > > > > > >> > > Also, I marked the "adding a firing reason" and "adding
> > firing
> > > > > > >> > > counter" as future work that are only examples of stuff
> that
> > > can
> > > > > be
> > > > > > >> > > implemented on top of the new interface. Initially, this
> > will
> > > > > > provide
> > > > > > >> > > exactly the same features as the old API but be
> extensible.
> > I
> > > > did
> > > > > > this
> > > > > > >> > > to not make the scope of this proposal to big because Radu
> > > also
> > > > > > >> > > suggested more changes and each of them should be covered
> > in a
> > > > > > separate
> > > > > > >> > design doc or FLIP.
> > > > > > >> > >
> > > > > > >> > > @Radu: On the different buffer types. I think this would
> be
> > > very
> > > > > > >> tricky.
> > > > > > >> > > Right now, people should also not rely on the fact that
> > > elements
> > > > > are
> > > > > > >> > > "FIFO". Some state backends might keep the elements in a
> > > > different
> > > > > > >> > > order and when you have merging windows/session windows
> the
> > > > order
> > > > > of
> > > > > > >> > > the elements will also not be preserved.
> > > > > > >> > >
> > > > > > >> > > Cheers,
> > > > > > >> > > Aljoscha
> > > > > > >> > >
> > > > > > >> > > On Wed, 13 Jul 2016 at 18:40 Radu Tudoran <
> > > > > radu.tudoran@huawei.com>
> > > > > > >> > wrote:
> > > > > > >> > >
> > > > > > >> > > > Hi,
> > > > > > >> > > >
> > > > > > >> > > > If it is to extend the Context to pass more information
> > > > between
> > > > > > the
> > > > > > >> > > > stages of processing a window (triggering -> process ->
> > > > > eviction),
> > > > > > >> > > > why not adding also a "EvictionInfo"? I think this might
> > > > > actually
> > > > > > >> > > > help with the issues discussed in the tread related to
> the
> > > > > > eviction
> > > > > > >> > policy.
> > > > > > >> > > > I could imagine using this parameter to pass the
> > conditions,
> > > > > from
> > > > > > >> > > > the processing stage to the evictor, about what events
> to
> > be
> > > > > > >> > eliminated.
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > public abstract class Context {
> > > > > > >> > > >
> > > > > > >> > > >    public abstract EvictionInfo evictionInfo();
> > > > > > >> > > >
> > > > > > >> > > > ...
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >    public abstract KEY key();
> > > > > > >> > > >
> > > > > > >> > > >    public abstract W window();
> > > > > > >> > > >
> > > > > > >> > > >    public abstract int id();
> > > > > > >> > > >
> > > > > > >> > > >    public abstract FiringInfo firingInfo();
> > > > > > >> > > >
> > > > > > >> > > >    public abstract Iterable<IN> elements();
> > > > > > >> > > >
> > > > > > >> > > >    public abstract void output(OUT value);
> > > > > > >> > > >
> > > > > > >> > > > }
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > Also on a slightly unrelated issue - how hard it would
> be
> > to
> > > > > > >> > > > introduce different types of buffers for the windows.
> > > > Currently
> > > > > > the
> > > > > > >> > > > existing one is behaving (when under processing) similar
> > > with
> > > > a
> > > > > > FIFO
> > > > > > >> > > > queue (in the sense that you need to start from
> beginning,
> > > > from
> > > > > > the
> > > > > > >> > oldest element).
> > > > > > >> > > > How about enabling for example also LIFO behavior (start
> > > > > iterating
> > > > > > >> > > > through the list from the most recent element). As in
> the
> > > > source
> > > > > > >> > > > queues or stacks are not actually used, perhaps we can
> > just
> > > > pass
> > > > > > >> > > > policies to the iterator - or have custom itrators
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > Dr. Radu Tudoran
> > > > > > >> > > > Research Engineer - Big Data Expert
> > > > > > >> > > > IT R&D Division
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH European Research
> > > Center
> > > > > > >> > > > Riesstrasse 25, 80992 München
> > > > > > >> > > >
> > > > > > >> > > > E-mail: radu.tudoran@huawei.com
> > > > > > >> > > > Mobile: +49 15209084330 <+49%201520%209084330>
> <01520%209084330>
> > <+49%201520%209084330>
> > > <01520%209084330>
> > > > > > >> > > > Telephone: +49 891588344173 <+49%2089%201588344173>
> <089%201588344173>
> > <+49%2089%201588344173>
> > > <089%201588344173>
> > > > > > >> > > >
> > > > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH Hansaallee 205,
> 40549
> > > > > > >> > > > Düsseldorf, Germany, www.huawei.com Registered
> > > > > > >> > > > Office: Düsseldorf, Register Court Düsseldorf, HRB
> 56063,
> > > > > Managing
> > > > > > >> > > > Director: Bo PENG, Wanzhou MENG, Lifang CHEN Sitz der
> > > > > > Gesellschaft:
> > > > > > >> > > > Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
> > > > > > >> > > > Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN This
> > > > e-mail
> > > > > > and
> > > > > > >> > > > its attachments contain confidential information from
> > > HUAWEI,
> > > > > > which
> > > > > > >> > > > is intended only for the person or entity whose address
> is
> > > > > listed
> > > > > > >> > above.
> > > > > > >> > > > Any use of the information contained herein in any way
> > > > > (including,
> > > > > > >> > > > but not limited to, total or partial disclosure,
> > > reproduction,
> > > > > or
> > > > > > >> > > > dissemination) by persons other than the intended
> > > recipient(s)
> > > > > is
> > > > > > >> > > > prohibited. If you receive this e-mail in error, please
> > > notify
> > > > > the
> > > > > > >> > > > sender by phone or email immediately and delete it!
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > -----Original Message-----
> > > > > > >> > > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > > > >> > > > Sent: Wednesday, July 13, 2016 2:24 PM
> > > > > > >> > > > To: dev@flink.apache.org
> > > > > > >> > > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function
> > > > Metadata
> > > > > > >> > > >
> > > > > > >> > > > Sure, I also thought about this but went for the
> "extreme"
> > > > > > initially.
> > > > > > >> > > > If no-one objects I'll update the doc in a bit.
> > > > > > >> > > >
> > > > > > >> > > > On Wed, 13 Jul 2016 at 14:17 Stephan Ewen <
> > sewen@apache.org
> > > >
> > > > > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Thanks for opening this.
> > > > > > >> > > > >
> > > > > > >> > > > > I see the need for having an extensible context object
> > for
> > > > > > window
> > > > > > >> > > > > function invocations, but i think hiding every
> parameter
> > > in
> > > > > the
> > > > > > >> > > > > context is a bit unnatural.
> > > > > > >> > > > >
> > > > > > >> > > > > How about having a function "apply(Key, Values,
> > > > WindowContext,
> > > > > > >> > > > Collector)"
> > > > > > >> > > > > ?
> > > > > > >> > > > > It should be possible to write the straightforward use
> > > cases
> > > > > > >> > > > > without accessing the context object.
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > On Wed, Jul 13, 2016 at 1:56 PM, Aljoscha Krettek
> > > > > > >> > > > > <al...@apache.org>
> > > > > > >> > > > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > > > Hi,
> > > > > > >> > > > > > this is a proposal to introduce a new interface for
> > the
> > > > > window
> > > > > > >> > > > > > function
> > > > > > >> > > > > to
> > > > > > >> > > > > > make it more extensible for the future where we
> might
> > > want
> > > > > to
> > > > > > >> > > > > > provide additional information about why a window
> > fired
> > > to
> > > > > the
> > > > > > >> > > > > > user
> > > > > > >> > > > function:
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending
> > > > > > >> > > > > +W
> > > > > > >> > > > > in
> > > > > > >> > > > > dow+Function+Metadata
> > > > > > >> > > > > >
> > > > > > >> > > > > > I'd appreciate your thoughts!
> > > > > > >> > > > > >
> > > > > > >> > > > > > Cheers,
> > > > > > >> > > > > > Aljoscha
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-2 Extending Window Function Metadata

Posted by Aljoscha Krettek <al...@apache.org>.
Hi Manu,
it's great that you want to work on this but another contributor has also
started looking into this and has some code already. It's unfortunate that
there is no Jira issue yet for this but I think he'll open one this week.

Cheers,
Aljoscha

On Wed, 2 Nov 2016 at 13:00 Manu Zhang <ow...@gmail.com> wrote:

> Hi Aljoscha,
>
> Have you started working on ProcessWindowFunction ? If not, may I take this
> task ?
>
> Thanks,
> Manu
>
>
> On Wed, Nov 2, 2016 at 5:16 PM Aljoscha Krettek <al...@apache.org>
> wrote:
>
> > I think we reached consensus here so I would like to mark this FLIP as
> > accepted. We will now process with implementing the first step, i.e.
> adding
> > the new ProcessWindowFunction.
> >
> > On Mon, 1 Aug 2016 at 18:08 Aljoscha Krettek <al...@apache.org>
> wrote:
> >
> > > Alright, that seems reasonable. I updated the doc to add the Collector
> to
> > > the method signature again.
> > >
> > > On Mon, 1 Aug 2016 at 00:59 Stephan Ewen <se...@apache.org> wrote:
> > >
> > > The Collector is pretty integral in all the other functions that return
> > > multiple elements. I honestly don't see us switching away from it,
> given
> > > that it is such a core part of the API.
> > >
> > > The close() method has, to my best knowledge, not caused issues, yet. I
> > > cannot recall anyone mentioning that the close() method confused them,
> > they
> > > accidentally called it, etc.
> > > I am wondering whether this is more of a theoretical than practical
> > issue.
> > >
> > > If we move one function away from Collector to be on the safe side for
> a
> > > "maybe change in the future", while keeping Collector in all other
> > > functions - I think that fragments the API concept wise more than it
> > > improves anything.
> > >
> > > On Sun, Jul 31, 2016 at 7:10 PM, Aljoscha Krettek <aljoscha@apache.org
> >
> > > wrote:
> > >
> > > > @Stephan: For the Output, should we keep using a Collector (which
> > > exposes)
> > > > the close() method which should never be called by users or create a
> > new
> > > > Output type that only has an "output" method. Collector can also be
> > used
> > > > but with a close() method that doesn't do anything. In the long run,
> I
> > > > thought it might be better to switch the type away from Collector.
> > > >
> > > > Cheers,
> > > > Aljoscha
> > > >
> > > > On Wed, 20 Jul 2016 at 01:25 Maximilian Michels <mx...@apache.org>
> > wrote:
> > > >
> > > > > I think it looks like Beam rather than Hadoop :)
> > > > >
> > > > > What Stephan meant was that he wanted a dedicated output method in
> > the
> > > > > ProcessWindowFunction. I agree with Aljoscha that we shouldn't
> expose
> > > > > the collector.
> > > > >
> > > > > On Tue, Jul 19, 2016 at 10:45 PM, Aljoscha Krettek <
> > > aljoscha@apache.org>
> > > > > wrote:
> > > > > > You mean keep the Collector? I don't like that one because it has
> > the
> > > > > > close() method that should never be called by the user.
> > > > > >
> > > > > > We can keep it, though, because all the other user function
> > > interfaces
> > > > > also
> > > > > > expose it to the user.
> > > > > >
> > > > > > On Tue, 19 Jul 2016 at 15:22 Stephan Ewen <se...@apache.org>
> > wrote:
> > > > > >
> > > > > >> I would actually make the output a separate parameter as well.
> > > Pretty
> > > > > much
> > > > > >> like the old variant, only replacing the "Window" parameter by
> the
> > > > > context
> > > > > >> (which contains everything about the window).
> > > > > >> It could also be called "WindowInvocationContext" or so.
> > > > > >>
> > > > > >> The current variant looks too Hadoop to me ;-) Everything done
> on
> > > the
> > > > > >> context object, and messy mocking when creating tests.
> > > > > >>
> > > > > >> On Mon, Jul 18, 2016 at 6:42 PM, Radu Tudoran <
> > > > radu.tudoran@huawei.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi,
> > > > > >> >
> > > > > >> > Sorry - I made a mistake - I was thinking of getting access to
> > the
> > > > > >> > collection (mist-read :) collector) of events in the window
> > buffer
> > > > in
> > > > > >> > order to be able to delete/evict some of them which are not
> > > > necessary
> > > > > the
> > > > > >> > last ones.
> > > > > >> >
> > > > > >> >
> > > > > >> > Radu
> > > > > >> >
> > > > > >> > -----Original Message-----
> > > > > >> > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > > >> > Sent: Monday, July 18, 2016 5:54 PM
> > > > > >> > To: dev@flink.apache.org
> > > > > >> > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function
> Metadata
> > > > > >> >
> > > > > >> > What about the collector? This is only used for emitting
> > elements
> > > to
> > > > > the
> > > > > >> > downstream operation.
> > > > > >> >
> > > > > >> > On Mon, 18 Jul 2016 at 17:52 Radu Tudoran <
> > > radu.tudoran@huawei.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Hi,
> > > > > >> > >
> > > > > >> > > I think it looks good and most importantly is that we can
> > extend
> > > > it
> > > > > in
> > > > > >> > > the directions discussed so far.
> > > > > >> > >
> > > > > >> > > One question though regarding the Collector - are we going
> to
> > be
> > > > > able
> > > > > >> > > to delete random elements from the list if this is not
> exposed
> > > as
> > > > a
> > > > > >> > > collection, at least to the evictor? If not, how are we
> going
> > to
> > > > > >> > > extend in the future to cover this case?
> > > > > >> > >
> > > > > >> > > Regarding the ordering - I also observed that there are
> > > situations
> > > > > >> > > where elements do not have a logical order. One example is
> if
> > > you
> > > > > have
> > > > > >> > > high rates of the events. Nevertheless, even if now is not
> the
> > > > time
> > > > > >> > > for this, I think in the future we can imagine having also
> > some
> > > > data
> > > > > >> > > structures that offer some ordering. It can save some
> > > computation
> > > > > >> > > efforts later in the functions for some use cases.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > Best regards,
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > -----Original Message-----
> > > > > >> > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > > >> > > Sent: Monday, July 18, 2016 3:45 PM
> > > > > >> > > To: dev@flink.apache.org
> > > > > >> > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function
> > Metadata
> > > > > >> > >
> > > > > >> > > I incorporated the changes. The proposed interface of
> > > > > >> > > ProcessWindowFunction is now this:
> > > > > >> > >
> > > > > >> > > public abstract class ProcessWindowFunction <IN, OUT, KEY, W
> > > > extends
> > > > > >> > > Window> implements Function {
> > > > > >> > >
> > > > > >> > >     public abstract void process(KEY key, Iterable<IN>
> > elements,
> > > > > >> > > Context
> > > > > >> > > ctx) throws Exception;
> > > > > >> > >
> > > > > >> > >     public abstract class Context {
> > > > > >> > >         public abstract W window();
> > > > > >> > >         public abstract void output(OUT value);
> > > > > >> > >     }
> > > > > >> > > }
> > > > > >> > >
> > > > > >> > > I'm proposing to not expose Collector anymore because it has
> > the
> > > > > >> > > close() method that should not be called by users. Having
> the
> > > > > output()
> > > > > >> > > call directly on the context should work just as well.
> > > > > >> > >
> > > > > >> > > Also, I marked the "adding a firing reason" and "adding
> firing
> > > > > >> > > counter" as future work that are only examples of stuff that
> > can
> > > > be
> > > > > >> > > implemented on top of the new interface. Initially, this
> will
> > > > > provide
> > > > > >> > > exactly the same features as the old API but be extensible.
> I
> > > did
> > > > > this
> > > > > >> > > to not make the scope of this proposal to big because Radu
> > also
> > > > > >> > > suggested more changes and each of them should be covered
> in a
> > > > > separate
> > > > > >> > design doc or FLIP.
> > > > > >> > >
> > > > > >> > > @Radu: On the different buffer types. I think this would be
> > very
> > > > > >> tricky.
> > > > > >> > > Right now, people should also not rely on the fact that
> > elements
> > > > are
> > > > > >> > > "FIFO". Some state backends might keep the elements in a
> > > different
> > > > > >> > > order and when you have merging windows/session windows the
> > > order
> > > > of
> > > > > >> > > the elements will also not be preserved.
> > > > > >> > >
> > > > > >> > > Cheers,
> > > > > >> > > Aljoscha
> > > > > >> > >
> > > > > >> > > On Wed, 13 Jul 2016 at 18:40 Radu Tudoran <
> > > > radu.tudoran@huawei.com>
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Hi,
> > > > > >> > > >
> > > > > >> > > > If it is to extend the Context to pass more information
> > > between
> > > > > the
> > > > > >> > > > stages of processing a window (triggering -> process ->
> > > > eviction),
> > > > > >> > > > why not adding also a "EvictionInfo"? I think this might
> > > > actually
> > > > > >> > > > help with the issues discussed in the tread related to the
> > > > > eviction
> > > > > >> > policy.
> > > > > >> > > > I could imagine using this parameter to pass the
> conditions,
> > > > from
> > > > > >> > > > the processing stage to the evictor, about what events to
> be
> > > > > >> > eliminated.
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > public abstract class Context {
> > > > > >> > > >
> > > > > >> > > >    public abstract EvictionInfo evictionInfo();
> > > > > >> > > >
> > > > > >> > > > ...
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >    public abstract KEY key();
> > > > > >> > > >
> > > > > >> > > >    public abstract W window();
> > > > > >> > > >
> > > > > >> > > >    public abstract int id();
> > > > > >> > > >
> > > > > >> > > >    public abstract FiringInfo firingInfo();
> > > > > >> > > >
> > > > > >> > > >    public abstract Iterable<IN> elements();
> > > > > >> > > >
> > > > > >> > > >    public abstract void output(OUT value);
> > > > > >> > > >
> > > > > >> > > > }
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > Also on a slightly unrelated issue - how hard it would be
> to
> > > > > >> > > > introduce different types of buffers for the windows.
> > > Currently
> > > > > the
> > > > > >> > > > existing one is behaving (when under processing) similar
> > with
> > > a
> > > > > FIFO
> > > > > >> > > > queue (in the sense that you need to start from beginning,
> > > from
> > > > > the
> > > > > >> > oldest element).
> > > > > >> > > > How about enabling for example also LIFO behavior (start
> > > > iterating
> > > > > >> > > > through the list from the most recent element). As in the
> > > source
> > > > > >> > > > queues or stacks are not actually used, perhaps we can
> just
> > > pass
> > > > > >> > > > policies to the iterator - or have custom itrators
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > Dr. Radu Tudoran
> > > > > >> > > > Research Engineer - Big Data Expert
> > > > > >> > > > IT R&D Division
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH European Research
> > Center
> > > > > >> > > > Riesstrasse 25, 80992 München
> > > > > >> > > >
> > > > > >> > > > E-mail: radu.tudoran@huawei.com
> > > > > >> > > > Mobile: +49 15209084330 <01520%209084330>
> <+49%201520%209084330>
> > <01520%209084330>
> > > > > >> > > > Telephone: +49 891588344173 <089%201588344173>
> <+49%2089%201588344173>
> > <089%201588344173>
> > > > > >> > > >
> > > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH Hansaallee 205, 40549
> > > > > >> > > > Düsseldorf, Germany, www.huawei.com Registered
> > > > > >> > > > Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
> > > > Managing
> > > > > >> > > > Director: Bo PENG, Wanzhou MENG, Lifang CHEN Sitz der
> > > > > Gesellschaft:
> > > > > >> > > > Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
> > > > > >> > > > Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN This
> > > e-mail
> > > > > and
> > > > > >> > > > its attachments contain confidential information from
> > HUAWEI,
> > > > > which
> > > > > >> > > > is intended only for the person or entity whose address is
> > > > listed
> > > > > >> > above.
> > > > > >> > > > Any use of the information contained herein in any way
> > > > (including,
> > > > > >> > > > but not limited to, total or partial disclosure,
> > reproduction,
> > > > or
> > > > > >> > > > dissemination) by persons other than the intended
> > recipient(s)
> > > > is
> > > > > >> > > > prohibited. If you receive this e-mail in error, please
> > notify
> > > > the
> > > > > >> > > > sender by phone or email immediately and delete it!
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > -----Original Message-----
> > > > > >> > > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > > >> > > > Sent: Wednesday, July 13, 2016 2:24 PM
> > > > > >> > > > To: dev@flink.apache.org
> > > > > >> > > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function
> > > Metadata
> > > > > >> > > >
> > > > > >> > > > Sure, I also thought about this but went for the "extreme"
> > > > > initially.
> > > > > >> > > > If no-one objects I'll update the doc in a bit.
> > > > > >> > > >
> > > > > >> > > > On Wed, 13 Jul 2016 at 14:17 Stephan Ewen <
> sewen@apache.org
> > >
> > > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Thanks for opening this.
> > > > > >> > > > >
> > > > > >> > > > > I see the need for having an extensible context object
> for
> > > > > window
> > > > > >> > > > > function invocations, but i think hiding every parameter
> > in
> > > > the
> > > > > >> > > > > context is a bit unnatural.
> > > > > >> > > > >
> > > > > >> > > > > How about having a function "apply(Key, Values,
> > > WindowContext,
> > > > > >> > > > Collector)"
> > > > > >> > > > > ?
> > > > > >> > > > > It should be possible to write the straightforward use
> > cases
> > > > > >> > > > > without accessing the context object.
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Wed, Jul 13, 2016 at 1:56 PM, Aljoscha Krettek
> > > > > >> > > > > <al...@apache.org>
> > > > > >> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hi,
> > > > > >> > > > > > this is a proposal to introduce a new interface for
> the
> > > > window
> > > > > >> > > > > > function
> > > > > >> > > > > to
> > > > > >> > > > > > make it more extensible for the future where we might
> > want
> > > > to
> > > > > >> > > > > > provide additional information about why a window
> fired
> > to
> > > > the
> > > > > >> > > > > > user
> > > > > >> > > > function:
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending
> > > > > >> > > > > +W
> > > > > >> > > > > in
> > > > > >> > > > > dow+Function+Metadata
> > > > > >> > > > > >
> > > > > >> > > > > > I'd appreciate your thoughts!
> > > > > >> > > > > >
> > > > > >> > > > > > Cheers,
> > > > > >> > > > > > Aljoscha
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-2 Extending Window Function Metadata

Posted by Manu Zhang <ow...@gmail.com>.
Hi Aljoscha,

Have you started working on ProcessWindowFunction ? If not, may I take this
task ?

Thanks,
Manu


On Wed, Nov 2, 2016 at 5:16 PM Aljoscha Krettek <al...@apache.org> wrote:

> I think we reached consensus here so I would like to mark this FLIP as
> accepted. We will now process with implementing the first step, i.e. adding
> the new ProcessWindowFunction.
>
> On Mon, 1 Aug 2016 at 18:08 Aljoscha Krettek <al...@apache.org> wrote:
>
> > Alright, that seems reasonable. I updated the doc to add the Collector to
> > the method signature again.
> >
> > On Mon, 1 Aug 2016 at 00:59 Stephan Ewen <se...@apache.org> wrote:
> >
> > The Collector is pretty integral in all the other functions that return
> > multiple elements. I honestly don't see us switching away from it, given
> > that it is such a core part of the API.
> >
> > The close() method has, to my best knowledge, not caused issues, yet. I
> > cannot recall anyone mentioning that the close() method confused them,
> they
> > accidentally called it, etc.
> > I am wondering whether this is more of a theoretical than practical
> issue.
> >
> > If we move one function away from Collector to be on the safe side for a
> > "maybe change in the future", while keeping Collector in all other
> > functions - I think that fragments the API concept wise more than it
> > improves anything.
> >
> > On Sun, Jul 31, 2016 at 7:10 PM, Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> > > @Stephan: For the Output, should we keep using a Collector (which
> > exposes)
> > > the close() method which should never be called by users or create a
> new
> > > Output type that only has an "output" method. Collector can also be
> used
> > > but with a close() method that doesn't do anything. In the long run, I
> > > thought it might be better to switch the type away from Collector.
> > >
> > > Cheers,
> > > Aljoscha
> > >
> > > On Wed, 20 Jul 2016 at 01:25 Maximilian Michels <mx...@apache.org>
> wrote:
> > >
> > > > I think it looks like Beam rather than Hadoop :)
> > > >
> > > > What Stephan meant was that he wanted a dedicated output method in
> the
> > > > ProcessWindowFunction. I agree with Aljoscha that we shouldn't expose
> > > > the collector.
> > > >
> > > > On Tue, Jul 19, 2016 at 10:45 PM, Aljoscha Krettek <
> > aljoscha@apache.org>
> > > > wrote:
> > > > > You mean keep the Collector? I don't like that one because it has
> the
> > > > > close() method that should never be called by the user.
> > > > >
> > > > > We can keep it, though, because all the other user function
> > interfaces
> > > > also
> > > > > expose it to the user.
> > > > >
> > > > > On Tue, 19 Jul 2016 at 15:22 Stephan Ewen <se...@apache.org>
> wrote:
> > > > >
> > > > >> I would actually make the output a separate parameter as well.
> > Pretty
> > > > much
> > > > >> like the old variant, only replacing the "Window" parameter by the
> > > > context
> > > > >> (which contains everything about the window).
> > > > >> It could also be called "WindowInvocationContext" or so.
> > > > >>
> > > > >> The current variant looks too Hadoop to me ;-) Everything done on
> > the
> > > > >> context object, and messy mocking when creating tests.
> > > > >>
> > > > >> On Mon, Jul 18, 2016 at 6:42 PM, Radu Tudoran <
> > > radu.tudoran@huawei.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi,
> > > > >> >
> > > > >> > Sorry - I made a mistake - I was thinking of getting access to
> the
> > > > >> > collection (mist-read :) collector) of events in the window
> buffer
> > > in
> > > > >> > order to be able to delete/evict some of them which are not
> > > necessary
> > > > the
> > > > >> > last ones.
> > > > >> >
> > > > >> >
> > > > >> > Radu
> > > > >> >
> > > > >> > -----Original Message-----
> > > > >> > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > >> > Sent: Monday, July 18, 2016 5:54 PM
> > > > >> > To: dev@flink.apache.org
> > > > >> > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function Metadata
> > > > >> >
> > > > >> > What about the collector? This is only used for emitting
> elements
> > to
> > > > the
> > > > >> > downstream operation.
> > > > >> >
> > > > >> > On Mon, 18 Jul 2016 at 17:52 Radu Tudoran <
> > radu.tudoran@huawei.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hi,
> > > > >> > >
> > > > >> > > I think it looks good and most importantly is that we can
> extend
> > > it
> > > > in
> > > > >> > > the directions discussed so far.
> > > > >> > >
> > > > >> > > One question though regarding the Collector - are we going to
> be
> > > > able
> > > > >> > > to delete random elements from the list if this is not exposed
> > as
> > > a
> > > > >> > > collection, at least to the evictor? If not, how are we going
> to
> > > > >> > > extend in the future to cover this case?
> > > > >> > >
> > > > >> > > Regarding the ordering - I also observed that there are
> > situations
> > > > >> > > where elements do not have a logical order. One example is if
> > you
> > > > have
> > > > >> > > high rates of the events. Nevertheless, even if now is not the
> > > time
> > > > >> > > for this, I think in the future we can imagine having also
> some
> > > data
> > > > >> > > structures that offer some ordering. It can save some
> > computation
> > > > >> > > efforts later in the functions for some use cases.
> > > > >> > >
> > > > >> > >
> > > > >> > > Best regards,
> > > > >> > >
> > > > >> > >
> > > > >> > > -----Original Message-----
> > > > >> > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > >> > > Sent: Monday, July 18, 2016 3:45 PM
> > > > >> > > To: dev@flink.apache.org
> > > > >> > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function
> Metadata
> > > > >> > >
> > > > >> > > I incorporated the changes. The proposed interface of
> > > > >> > > ProcessWindowFunction is now this:
> > > > >> > >
> > > > >> > > public abstract class ProcessWindowFunction <IN, OUT, KEY, W
> > > extends
> > > > >> > > Window> implements Function {
> > > > >> > >
> > > > >> > >     public abstract void process(KEY key, Iterable<IN>
> elements,
> > > > >> > > Context
> > > > >> > > ctx) throws Exception;
> > > > >> > >
> > > > >> > >     public abstract class Context {
> > > > >> > >         public abstract W window();
> > > > >> > >         public abstract void output(OUT value);
> > > > >> > >     }
> > > > >> > > }
> > > > >> > >
> > > > >> > > I'm proposing to not expose Collector anymore because it has
> the
> > > > >> > > close() method that should not be called by users. Having the
> > > > output()
> > > > >> > > call directly on the context should work just as well.
> > > > >> > >
> > > > >> > > Also, I marked the "adding a firing reason" and "adding firing
> > > > >> > > counter" as future work that are only examples of stuff that
> can
> > > be
> > > > >> > > implemented on top of the new interface. Initially, this will
> > > > provide
> > > > >> > > exactly the same features as the old API but be extensible. I
> > did
> > > > this
> > > > >> > > to not make the scope of this proposal to big because Radu
> also
> > > > >> > > suggested more changes and each of them should be covered in a
> > > > separate
> > > > >> > design doc or FLIP.
> > > > >> > >
> > > > >> > > @Radu: On the different buffer types. I think this would be
> very
> > > > >> tricky.
> > > > >> > > Right now, people should also not rely on the fact that
> elements
> > > are
> > > > >> > > "FIFO". Some state backends might keep the elements in a
> > different
> > > > >> > > order and when you have merging windows/session windows the
> > order
> > > of
> > > > >> > > the elements will also not be preserved.
> > > > >> > >
> > > > >> > > Cheers,
> > > > >> > > Aljoscha
> > > > >> > >
> > > > >> > > On Wed, 13 Jul 2016 at 18:40 Radu Tudoran <
> > > radu.tudoran@huawei.com>
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Hi,
> > > > >> > > >
> > > > >> > > > If it is to extend the Context to pass more information
> > between
> > > > the
> > > > >> > > > stages of processing a window (triggering -> process ->
> > > eviction),
> > > > >> > > > why not adding also a "EvictionInfo"? I think this might
> > > actually
> > > > >> > > > help with the issues discussed in the tread related to the
> > > > eviction
> > > > >> > policy.
> > > > >> > > > I could imagine using this parameter to pass the conditions,
> > > from
> > > > >> > > > the processing stage to the evictor, about what events to be
> > > > >> > eliminated.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > public abstract class Context {
> > > > >> > > >
> > > > >> > > >    public abstract EvictionInfo evictionInfo();
> > > > >> > > >
> > > > >> > > > ...
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >    public abstract KEY key();
> > > > >> > > >
> > > > >> > > >    public abstract W window();
> > > > >> > > >
> > > > >> > > >    public abstract int id();
> > > > >> > > >
> > > > >> > > >    public abstract FiringInfo firingInfo();
> > > > >> > > >
> > > > >> > > >    public abstract Iterable<IN> elements();
> > > > >> > > >
> > > > >> > > >    public abstract void output(OUT value);
> > > > >> > > >
> > > > >> > > > }
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Also on a slightly unrelated issue - how hard it would be to
> > > > >> > > > introduce different types of buffers for the windows.
> > Currently
> > > > the
> > > > >> > > > existing one is behaving (when under processing) similar
> with
> > a
> > > > FIFO
> > > > >> > > > queue (in the sense that you need to start from beginning,
> > from
> > > > the
> > > > >> > oldest element).
> > > > >> > > > How about enabling for example also LIFO behavior (start
> > > iterating
> > > > >> > > > through the list from the most recent element). As in the
> > source
> > > > >> > > > queues or stacks are not actually used, perhaps we can just
> > pass
> > > > >> > > > policies to the iterator - or have custom itrators
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Dr. Radu Tudoran
> > > > >> > > > Research Engineer - Big Data Expert
> > > > >> > > > IT R&D Division
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH European Research
> Center
> > > > >> > > > Riesstrasse 25, 80992 München
> > > > >> > > >
> > > > >> > > > E-mail: radu.tudoran@huawei.com
> > > > >> > > > Mobile: +49 15209084330 <+49%201520%209084330>
> <01520%209084330>
> > > > >> > > > Telephone: +49 891588344173 <+49%2089%201588344173>
> <089%201588344173>
> > > > >> > > >
> > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH Hansaallee 205, 40549
> > > > >> > > > Düsseldorf, Germany, www.huawei.com Registered
> > > > >> > > > Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
> > > Managing
> > > > >> > > > Director: Bo PENG, Wanzhou MENG, Lifang CHEN Sitz der
> > > > Gesellschaft:
> > > > >> > > > Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
> > > > >> > > > Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN This
> > e-mail
> > > > and
> > > > >> > > > its attachments contain confidential information from
> HUAWEI,
> > > > which
> > > > >> > > > is intended only for the person or entity whose address is
> > > listed
> > > > >> > above.
> > > > >> > > > Any use of the information contained herein in any way
> > > (including,
> > > > >> > > > but not limited to, total or partial disclosure,
> reproduction,
> > > or
> > > > >> > > > dissemination) by persons other than the intended
> recipient(s)
> > > is
> > > > >> > > > prohibited. If you receive this e-mail in error, please
> notify
> > > the
> > > > >> > > > sender by phone or email immediately and delete it!
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > -----Original Message-----
> > > > >> > > > From: Aljoscha Krettek [mailto:aljoscha@apache.org]
> > > > >> > > > Sent: Wednesday, July 13, 2016 2:24 PM
> > > > >> > > > To: dev@flink.apache.org
> > > > >> > > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function
> > Metadata
> > > > >> > > >
> > > > >> > > > Sure, I also thought about this but went for the "extreme"
> > > > initially.
> > > > >> > > > If no-one objects I'll update the doc in a bit.
> > > > >> > > >
> > > > >> > > > On Wed, 13 Jul 2016 at 14:17 Stephan Ewen <sewen@apache.org
> >
> > > > wrote:
> > > > >> > > >
> > > > >> > > > > Thanks for opening this.
> > > > >> > > > >
> > > > >> > > > > I see the need for having an extensible context object for
> > > > window
> > > > >> > > > > function invocations, but i think hiding every parameter
> in
> > > the
> > > > >> > > > > context is a bit unnatural.
> > > > >> > > > >
> > > > >> > > > > How about having a function "apply(Key, Values,
> > WindowContext,
> > > > >> > > > Collector)"
> > > > >> > > > > ?
> > > > >> > > > > It should be possible to write the straightforward use
> cases
> > > > >> > > > > without accessing the context object.
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Wed, Jul 13, 2016 at 1:56 PM, Aljoscha Krettek
> > > > >> > > > > <al...@apache.org>
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi,
> > > > >> > > > > > this is a proposal to introduce a new interface for the
> > > window
> > > > >> > > > > > function
> > > > >> > > > > to
> > > > >> > > > > > make it more extensible for the future where we might
> want
> > > to
> > > > >> > > > > > provide additional information about why a window fired
> to
> > > the
> > > > >> > > > > > user
> > > > >> > > > function:
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending
> > > > >> > > > > +W
> > > > >> > > > > in
> > > > >> > > > > dow+Function+Metadata
> > > > >> > > > > >
> > > > >> > > > > > I'd appreciate your thoughts!
> > > > >> > > > > >
> > > > >> > > > > > Cheers,
> > > > >> > > > > > Aljoscha
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> >
>