You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Akshay Dixit <ak...@gmail.com> on 2015/03/15 15:22:39 UTC

[GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534

Hi,
I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India. I'm
currently interested in distributed systems and stream processing and am
looking to delve deeper into the subject, and hope to get some insight by
contributing to Apache Flink. I've gathered some idea of the
flink-streaming codebase by recently working on a PR for FLINK-1450[2].

Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that I would
love to work on over the summer. I was wondering which amongst these would
be more appreciated by the community, so I can start working towards a
proposal for either one.

Regarding FLINK-1534, I was wondering why would simply merging and
filtering the existing streams for events we want to detect not work? Also
on going through the document mentioned by @mbalassi in the JIRA
comment[5], the authors specify some Runtime Event Detection concepts in
Section 5.2. I'm assuming the project entails on building a similar analogy
using Flink and the deliverables would include working pattern matching
operators over Flink DataStreams as described in the report. If so, then
shouldn't it be trivial to implement the described the Binary operator
using a WindowedStream and a Filter?
I hope my questions don't seem misplaced here and I would appreciate links
to literature where I can learn more on the topic.

Regards,
Akshay Dixit

[1] : http://akshaydixi.me
[2] : https://github.com/apache/flink/pull/481
[3] : https://issues.apache.org/jira/browse/FLINK-1617
[4] : https://issues.apache.org/jira/browse/FLINK-1534
[5] :
http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf

Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534

Posted by Akshay Dixit <ak...@gmail.com>.
Thanks for going through it Gyula.
I've made the necessary amends to the timeline and submitted the proposal.

Regards,
Akshay Dixit


On Thu, Mar 26, 2015 at 8:53 PM, Gyula Fóra <gy...@apache.org> wrote:

> I think it looks good for a start, we will have to work on the API a little
> bit together to make it fit smoothly with what we currently have.
>
> There is a few gaps in the timeline but that you have probably noticed :)
>
> Otherwise +1 from me.
>
> On Wed, Mar 25, 2015 at 11:35 PM, Akshay Dixit <ak...@gmail.com>
> wrote:
>
> > Hi,
> > The link to the draft proposal that I've prepared is
> > https://gist.github.com/akshaydixi/88f3fbcebab0119a6a31
> > It would be great if I could get some feedback on it.
> > Regards,
> > Akshay Dixit
> >
> > On Wed, Mar 25, 2015 at 2:03 AM, Akshay Dixit <ak...@gmail.com>
> > wrote:
> >
> > > Thanks Gyula.
> > >
> > > I agree too that simple and working implementations are preferrable
> over
> > > hacky complex solutions. I'll start sketching out an initial
> > straighforward
> > > API with only basic pattern matching features
> > > and base it on the existing windowing API. I'll post a draft of the
> > > proposal,  keeping the points you've said in mind, tomorrow, so you can
> > > look it over to see if its all right.
> > > Regards,
> > > Akshay Dixit
> > >
> > > On Tue, Mar 24, 2015 at 6:30 PM, Gyula Fóra <gy...@apache.org> wrote:
> > >
> > >> Hey Dixit,
> > >>
> > >> Sorry for the delay, I had to discuss this in more detail with some of
> > our
> > >> other core developers.
> > >>
> > >> The consensus seems to be that we would like push this project in a
> > >> direction where the changes can be quickly included in the next
> > releases.
> > >> For this it is essential that we implement features that are complete
> > (and
> > >> clean) from the users perspective. This does not necessarily mean that
> > we
> > >> would like to have everything at once but rather that it is preferable
> > to
> > >> start with something clean and simple (for instance the naive chained
> > >> filter approach) and progressively build more complex logic.
> > >>
> > >> This also mean that we would like to avoid "researchy" code in the
> > >> codebase
> > >> as much as possible. Of course once we have a stable api for this
> > >> functionality we can work towards making the optimizations that you
> have
> > >> mentioned like operator sharing and so on.
> > >>
> > >> The ideal proposal would give a clear sketch of the pattern matching
> API
> > >> that you would like to implement, which might be some added operators
> at
> > >> first to the current API and possible a DSL later with more advanced
> > >> functionality (this would probably go in a separate library until it
> is
> > >> very stable).
> > >>
> > >> So please in the proposal include a preview of what the pattern
> matching
> > >> syntax would look like integrated with the current operators, how it
> > would
> > >> interact with other parts of the system etc.
> > >>
> > >> These are the thing we need to figure out before we consider the
> > >> optimizations I think, because it usually turns out, that the API
> > >> semantics
> > >> you would like to provide can hugely affect (probably limit) the
> > >> possibilities that you have afterwards in terms of optimizations.
> > >>
> > >> Let me know if you have further questions regarding this :)
> > >>
> > >> Gyula
> > >>
> > >> On Tue, Mar 24, 2015 at 12:01 PM, Gyula Fóra <gy...@apache.org>
> wrote:
> > >>
> > >> > Hey,
> > >> >
> > >> > Give me an hour or so as I am in a meeting currently, but I will get
> > >> back
> > >> > to you afterwards.
> > >> >
> > >> > Regards,
> > >> > Gyula
> > >> >
> > >> > On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <
> akshaydixi@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> Hi,
> > >> >> It'd really help if I got a reply soon. It'll be helpful in writing
> > the
> > >> >> proposal since the deadline is on 27th. Thanks
> > >> >> Regards,
> > >> >> Akshay Dixit
> > >> >>
> > >> >> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <
> akshaydixi@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >> > Thanks for the explanation Marton. I've decided to try out for
> > >> >> FLINK-1534.
> > >> >> >
> > >> >> > After reading through the thesis[4] and a few other
> > papers[1][2][3],
> > >> I
> > >> >> > believe I've gathered a little context to ask more questions. But
> > I'm
> > >> >> still
> > >> >> > not sure how Flink's internals work
> > >> >> > so please bear with me. Although the ongoing effort to document
> the
> > >> >> > architecture and internal is really helpful for newbies like me
> and
> > >> >> would
> > >> >> > greatly decrease the ramping up time.
> > >> >> >
> > >> >> > Detecting a pattern of events would comprise of a pipeline that
> > >> accepts
> > >> >> > the pattern query and
> > >> >> > sources of DataStreams, and outputs detected matches of that
> > pattern
> > >> to
> > >> >> a
> > >> >> > sink or forwards it
> > >> >> > along to another stream for further computation.
> > >> >> >
> > >> >> > As you said, a simple filter-join-aggregate query system could be
> > >> >> > developed implementing using the existing Streaming windowing
> API.
> > >> >> > But matching over complex events and decoding their pattern
> queries
> > >> >> would
> > >> >> > require implementing a DSL that transforms queries into an
> > evaluation
> > >> >> > model. For e.g,
> > >> >> > in [1], the authors have implemented an NFA automaton with a
> shared
> > >> >> > versioned buffer that models the queries. In [4], the authors
> > >> >> > propose a new language that is much more expressive and compiles
> > >> into a
> > >> >> > topology graph for Storm.
> > >> >> >
> > >> >> > So in Flink's case, I believe the proposed DSL would generate
> > >> operator
> > >> >> > graphs for the Flink compiler to schedule Jobgraphs over
> > >> TaskManagers.
> > >> >> > If we don't depend on the Windowing API, would we need to create
> > new
> > >> >> > operators such as the Projection, Conjunction and Union operators
> > >> >> defined
> > >> >> > in [4] ?
> > >> >> > Also I would like to hear your thoughts on how to approach
> scaling
> > >> the
> > >> >> > pattern matching query. Note all these techniques talk about
> > scaling
> > >> a
> > >> >> > single query.
> > >> >> > I've read various ways such as
> > >> >> >
> > >> >> > 1.  Merging equivalent runs[1] -: This seems a good way to squash
> > >> >> multiple
> > >> >> > instances of pattern matching forks into a single one if they
> have
> > >> the
> > >> >> same
> > >> >> > state.
> > >> >> > But I'm not sure how we would implement this in Flink since this
> > is a
> > >> >> > runtime optimization.
> > >> >> >
> > >> >> > 2.  Implementing a matched version buffer[1] -: This would
> involve
> > >> >> sharing
> > >> >> > state of a buffer datastructure across multiple candidate match
> > >> >> instances
> > >> >> > for the pattern.
> > >> >> >
> > >> >> > 3.  Splitting complex composite patterns into simpler
> > sub-patterns[4]
> > >> >> and
> > >> >> > executing separate queries to detect those sub-patterns. This
> might
> > >> >> > translate into different
> > >> >> > tasks and duplicating the source datastreams to all the new
> > generated
> > >> >> > tasks.
> > >> >> >
> > >> >> > Also since I don't know how the Flink compiler behaves, would
> some
> > of
> > >> >> the
> > >> >> > optimizations involve making changes to it too?
> > >> >> >
> > >> >> > Regards,
> > >> >> > Akshay Dixit
> > >> >> >
> > >> >> > [1] : Efficient Pattern Matching over Event Streams
> > >> >> > <
> > >>
> http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf
> > >> >> >
> > >> >> > [2] : On Supporting Kleene Closure over Event Streams
> > >> >> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf
> >
> > >> >> > [3] : Processing Flows of Information: From Data Stream to
> Complex
> > >> Event
> > >> >> > Processing
> > >> >> > <
> > >> >>
> > >>
> >
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf
> > >> >> >
> > >> >> > [4] : Distributing Complex Event Detection
> > >> >> > <
> > >> >>
> > >>
> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> > >
> > >> >> >
> > >> >> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <
> > >> >> balassi.marton@gmail.com>
> > >> >> > wrote:
> > >> >> >
> > >> >> >> Dear Akshay,
> > >> >> >>
> > >> >> >> Thanks again for your interest and for the recent contribution
> to
> > >> >> >> streaming.
> > >> >> >>
> > >> >> >> Both of the projects mentioned wold be largely appreciated by
> the
> > >> >> >> community, and you can also propose other project suggestions
> here
> > >> for
> > >> >> >> discussion.
> > >> >> >>
> > >> >> >> Regarding FLINK-1534, the thesis I mentioned serves as a
> starting
> > >> point
> > >> >> >> and
> > >> >> >> indeed the basic solution can be implemented with filtering and
> > >> >> >> windowing/mapping with some state storing whether the cause of
> an
> > >> event
> > >> >> >> has
> > >> >> >> been already seen. Solely relying on the now existing windowing
> > API
> > >> >> this
> > >> >> >> however might cause performance issues if the events also have
> an
> > >> >> >> expiration timeout - some optimization there would be included.
> > The
> > >> >> >> further
> > >> >> >> challenge is to try to further exploit the parallel job
> execution
> > of
> > >> >> Flink
> > >> >> >> to possibly scale a pattern matching query.
> > >> >> >>
> > >> >> >> Best,
> > >> >> >>
> > >> >> >> Marton
> > >> >> >>
> > >> >> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <
> > akshaydixi@gmail.com
> > >> >
> > >> >> >> wrote:
> > >> >> >>
> > >> >> >> > Hi,
> > >> >> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore,
> India.
> > >> I'm
> > >> >> >> > currently interested in distributed systems and stream
> > processing
> > >> >> and am
> > >> >> >> > looking to delve deeper into the subject, and hope to get some
> > >> >> insight
> > >> >> >> by
> > >> >> >> > contributing to Apache Flink. I've gathered some idea of the
> > >> >> >> > flink-streaming codebase by recently working on a PR for
> > >> >> FLINK-1450[2].
> > >> >> >> >
> > >> >> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects
> > >> that I
> > >> >> >> would
> > >> >> >> > love to work on over the summer. I was wondering which amongst
> > >> these
> > >> >> >> would
> > >> >> >> > be more appreciated by the community, so I can start working
> > >> towards
> > >> >> a
> > >> >> >> > proposal for either one.
> > >> >> >> >
> > >> >> >> > Regarding FLINK-1534, I was wondering why would simply merging
> > and
> > >> >> >> > filtering the existing streams for events we want to detect
> not
> > >> work?
> > >> >> >> Also
> > >> >> >> > on going through the document mentioned by @mbalassi in the
> JIRA
> > >> >> >> > comment[5], the authors specify some Runtime Event Detection
> > >> >> concepts in
> > >> >> >> > Section 5.2. I'm assuming the project entails on building a
> > >> similar
> > >> >> >> analogy
> > >> >> >> > using Flink and the deliverables would include working pattern
> > >> >> matching
> > >> >> >> > operators over Flink DataStreams as described in the report.
> If
> > >> so,
> > >> >> then
> > >> >> >> > shouldn't it be trivial to implement the described the Binary
> > >> >> operator
> > >> >> >> > using a WindowedStream and a Filter?
> > >> >> >> > I hope my questions don't seem misplaced here and I would
> > >> appreciate
> > >> >> >> links
> > >> >> >> > to literature where I can learn more on the topic.
> > >> >> >> >
> > >> >> >> > Regards,
> > >> >> >> > Akshay Dixit
> > >> >> >> >
> > >> >> >> > [1] : http://akshaydixi.me
> > >> >> >> > [2] : https://github.com/apache/flink/pull/481
> > >> >> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
> > >> >> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
> > >> >> >> > [5] :
> > >> >> >> >
> > >> >>
> > >>
> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> > >> >> >> >
> > >> >> >>
> > >> >> >
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534

Posted by Gyula Fóra <gy...@apache.org>.
I think it looks good for a start, we will have to work on the API a little
bit together to make it fit smoothly with what we currently have.

There is a few gaps in the timeline but that you have probably noticed :)

Otherwise +1 from me.

On Wed, Mar 25, 2015 at 11:35 PM, Akshay Dixit <ak...@gmail.com> wrote:

> Hi,
> The link to the draft proposal that I've prepared is
> https://gist.github.com/akshaydixi/88f3fbcebab0119a6a31
> It would be great if I could get some feedback on it.
> Regards,
> Akshay Dixit
>
> On Wed, Mar 25, 2015 at 2:03 AM, Akshay Dixit <ak...@gmail.com>
> wrote:
>
> > Thanks Gyula.
> >
> > I agree too that simple and working implementations are preferrable over
> > hacky complex solutions. I'll start sketching out an initial
> straighforward
> > API with only basic pattern matching features
> > and base it on the existing windowing API. I'll post a draft of the
> > proposal,  keeping the points you've said in mind, tomorrow, so you can
> > look it over to see if its all right.
> > Regards,
> > Akshay Dixit
> >
> > On Tue, Mar 24, 2015 at 6:30 PM, Gyula Fóra <gy...@apache.org> wrote:
> >
> >> Hey Dixit,
> >>
> >> Sorry for the delay, I had to discuss this in more detail with some of
> our
> >> other core developers.
> >>
> >> The consensus seems to be that we would like push this project in a
> >> direction where the changes can be quickly included in the next
> releases.
> >> For this it is essential that we implement features that are complete
> (and
> >> clean) from the users perspective. This does not necessarily mean that
> we
> >> would like to have everything at once but rather that it is preferable
> to
> >> start with something clean and simple (for instance the naive chained
> >> filter approach) and progressively build more complex logic.
> >>
> >> This also mean that we would like to avoid "researchy" code in the
> >> codebase
> >> as much as possible. Of course once we have a stable api for this
> >> functionality we can work towards making the optimizations that you have
> >> mentioned like operator sharing and so on.
> >>
> >> The ideal proposal would give a clear sketch of the pattern matching API
> >> that you would like to implement, which might be some added operators at
> >> first to the current API and possible a DSL later with more advanced
> >> functionality (this would probably go in a separate library until it is
> >> very stable).
> >>
> >> So please in the proposal include a preview of what the pattern matching
> >> syntax would look like integrated with the current operators, how it
> would
> >> interact with other parts of the system etc.
> >>
> >> These are the thing we need to figure out before we consider the
> >> optimizations I think, because it usually turns out, that the API
> >> semantics
> >> you would like to provide can hugely affect (probably limit) the
> >> possibilities that you have afterwards in terms of optimizations.
> >>
> >> Let me know if you have further questions regarding this :)
> >>
> >> Gyula
> >>
> >> On Tue, Mar 24, 2015 at 12:01 PM, Gyula Fóra <gy...@apache.org> wrote:
> >>
> >> > Hey,
> >> >
> >> > Give me an hour or so as I am in a meeting currently, but I will get
> >> back
> >> > to you afterwards.
> >> >
> >> > Regards,
> >> > Gyula
> >> >
> >> > On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <ak...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >> It'd really help if I got a reply soon. It'll be helpful in writing
> the
> >> >> proposal since the deadline is on 27th. Thanks
> >> >> Regards,
> >> >> Akshay Dixit
> >> >>
> >> >> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <ak...@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > Thanks for the explanation Marton. I've decided to try out for
> >> >> FLINK-1534.
> >> >> >
> >> >> > After reading through the thesis[4] and a few other
> papers[1][2][3],
> >> I
> >> >> > believe I've gathered a little context to ask more questions. But
> I'm
> >> >> still
> >> >> > not sure how Flink's internals work
> >> >> > so please bear with me. Although the ongoing effort to document the
> >> >> > architecture and internal is really helpful for newbies like me and
> >> >> would
> >> >> > greatly decrease the ramping up time.
> >> >> >
> >> >> > Detecting a pattern of events would comprise of a pipeline that
> >> accepts
> >> >> > the pattern query and
> >> >> > sources of DataStreams, and outputs detected matches of that
> pattern
> >> to
> >> >> a
> >> >> > sink or forwards it
> >> >> > along to another stream for further computation.
> >> >> >
> >> >> > As you said, a simple filter-join-aggregate query system could be
> >> >> > developed implementing using the existing Streaming windowing API.
> >> >> > But matching over complex events and decoding their pattern queries
> >> >> would
> >> >> > require implementing a DSL that transforms queries into an
> evaluation
> >> >> > model. For e.g,
> >> >> > in [1], the authors have implemented an NFA automaton with a shared
> >> >> > versioned buffer that models the queries. In [4], the authors
> >> >> > propose a new language that is much more expressive and compiles
> >> into a
> >> >> > topology graph for Storm.
> >> >> >
> >> >> > So in Flink's case, I believe the proposed DSL would generate
> >> operator
> >> >> > graphs for the Flink compiler to schedule Jobgraphs over
> >> TaskManagers.
> >> >> > If we don't depend on the Windowing API, would we need to create
> new
> >> >> > operators such as the Projection, Conjunction and Union operators
> >> >> defined
> >> >> > in [4] ?
> >> >> > Also I would like to hear your thoughts on how to approach scaling
> >> the
> >> >> > pattern matching query. Note all these techniques talk about
> scaling
> >> a
> >> >> > single query.
> >> >> > I've read various ways such as
> >> >> >
> >> >> > 1.  Merging equivalent runs[1] -: This seems a good way to squash
> >> >> multiple
> >> >> > instances of pattern matching forks into a single one if they have
> >> the
> >> >> same
> >> >> > state.
> >> >> > But I'm not sure how we would implement this in Flink since this
> is a
> >> >> > runtime optimization.
> >> >> >
> >> >> > 2.  Implementing a matched version buffer[1] -: This would involve
> >> >> sharing
> >> >> > state of a buffer datastructure across multiple candidate match
> >> >> instances
> >> >> > for the pattern.
> >> >> >
> >> >> > 3.  Splitting complex composite patterns into simpler
> sub-patterns[4]
> >> >> and
> >> >> > executing separate queries to detect those sub-patterns. This might
> >> >> > translate into different
> >> >> > tasks and duplicating the source datastreams to all the new
> generated
> >> >> > tasks.
> >> >> >
> >> >> > Also since I don't know how the Flink compiler behaves, would some
> of
> >> >> the
> >> >> > optimizations involve making changes to it too?
> >> >> >
> >> >> > Regards,
> >> >> > Akshay Dixit
> >> >> >
> >> >> > [1] : Efficient Pattern Matching over Event Streams
> >> >> > <
> >> http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf
> >> >> >
> >> >> > [2] : On Supporting Kleene Closure over Event Streams
> >> >> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf>
> >> >> > [3] : Processing Flows of Information: From Data Stream to Complex
> >> Event
> >> >> > Processing
> >> >> > <
> >> >>
> >>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf
> >> >> >
> >> >> > [4] : Distributing Complex Event Detection
> >> >> > <
> >> >>
> >> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >
> >> >> >
> >> >> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <
> >> >> balassi.marton@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> Dear Akshay,
> >> >> >>
> >> >> >> Thanks again for your interest and for the recent contribution to
> >> >> >> streaming.
> >> >> >>
> >> >> >> Both of the projects mentioned wold be largely appreciated by the
> >> >> >> community, and you can also propose other project suggestions here
> >> for
> >> >> >> discussion.
> >> >> >>
> >> >> >> Regarding FLINK-1534, the thesis I mentioned serves as a starting
> >> point
> >> >> >> and
> >> >> >> indeed the basic solution can be implemented with filtering and
> >> >> >> windowing/mapping with some state storing whether the cause of an
> >> event
> >> >> >> has
> >> >> >> been already seen. Solely relying on the now existing windowing
> API
> >> >> this
> >> >> >> however might cause performance issues if the events also have an
> >> >> >> expiration timeout - some optimization there would be included.
> The
> >> >> >> further
> >> >> >> challenge is to try to further exploit the parallel job execution
> of
> >> >> Flink
> >> >> >> to possibly scale a pattern matching query.
> >> >> >>
> >> >> >> Best,
> >> >> >>
> >> >> >> Marton
> >> >> >>
> >> >> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <
> akshaydixi@gmail.com
> >> >
> >> >> >> wrote:
> >> >> >>
> >> >> >> > Hi,
> >> >> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India.
> >> I'm
> >> >> >> > currently interested in distributed systems and stream
> processing
> >> >> and am
> >> >> >> > looking to delve deeper into the subject, and hope to get some
> >> >> insight
> >> >> >> by
> >> >> >> > contributing to Apache Flink. I've gathered some idea of the
> >> >> >> > flink-streaming codebase by recently working on a PR for
> >> >> FLINK-1450[2].
> >> >> >> >
> >> >> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects
> >> that I
> >> >> >> would
> >> >> >> > love to work on over the summer. I was wondering which amongst
> >> these
> >> >> >> would
> >> >> >> > be more appreciated by the community, so I can start working
> >> towards
> >> >> a
> >> >> >> > proposal for either one.
> >> >> >> >
> >> >> >> > Regarding FLINK-1534, I was wondering why would simply merging
> and
> >> >> >> > filtering the existing streams for events we want to detect not
> >> work?
> >> >> >> Also
> >> >> >> > on going through the document mentioned by @mbalassi in the JIRA
> >> >> >> > comment[5], the authors specify some Runtime Event Detection
> >> >> concepts in
> >> >> >> > Section 5.2. I'm assuming the project entails on building a
> >> similar
> >> >> >> analogy
> >> >> >> > using Flink and the deliverables would include working pattern
> >> >> matching
> >> >> >> > operators over Flink DataStreams as described in the report. If
> >> so,
> >> >> then
> >> >> >> > shouldn't it be trivial to implement the described the Binary
> >> >> operator
> >> >> >> > using a WindowedStream and a Filter?
> >> >> >> > I hope my questions don't seem misplaced here and I would
> >> appreciate
> >> >> >> links
> >> >> >> > to literature where I can learn more on the topic.
> >> >> >> >
> >> >> >> > Regards,
> >> >> >> > Akshay Dixit
> >> >> >> >
> >> >> >> > [1] : http://akshaydixi.me
> >> >> >> > [2] : https://github.com/apache/flink/pull/481
> >> >> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
> >> >> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
> >> >> >> > [5] :
> >> >> >> >
> >> >>
> >> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534

Posted by Akshay Dixit <ak...@gmail.com>.
Hi,
The link to the draft proposal that I've prepared is
https://gist.github.com/akshaydixi/88f3fbcebab0119a6a31
It would be great if I could get some feedback on it.
Regards,
Akshay Dixit

On Wed, Mar 25, 2015 at 2:03 AM, Akshay Dixit <ak...@gmail.com> wrote:

> Thanks Gyula.
>
> I agree too that simple and working implementations are preferrable over
> hacky complex solutions. I'll start sketching out an initial straighforward
> API with only basic pattern matching features
> and base it on the existing windowing API. I'll post a draft of the
> proposal,  keeping the points you've said in mind, tomorrow, so you can
> look it over to see if its all right.
> Regards,
> Akshay Dixit
>
> On Tue, Mar 24, 2015 at 6:30 PM, Gyula Fóra <gy...@apache.org> wrote:
>
>> Hey Dixit,
>>
>> Sorry for the delay, I had to discuss this in more detail with some of our
>> other core developers.
>>
>> The consensus seems to be that we would like push this project in a
>> direction where the changes can be quickly included in the next releases.
>> For this it is essential that we implement features that are complete (and
>> clean) from the users perspective. This does not necessarily mean that we
>> would like to have everything at once but rather that it is preferable to
>> start with something clean and simple (for instance the naive chained
>> filter approach) and progressively build more complex logic.
>>
>> This also mean that we would like to avoid "researchy" code in the
>> codebase
>> as much as possible. Of course once we have a stable api for this
>> functionality we can work towards making the optimizations that you have
>> mentioned like operator sharing and so on.
>>
>> The ideal proposal would give a clear sketch of the pattern matching API
>> that you would like to implement, which might be some added operators at
>> first to the current API and possible a DSL later with more advanced
>> functionality (this would probably go in a separate library until it is
>> very stable).
>>
>> So please in the proposal include a preview of what the pattern matching
>> syntax would look like integrated with the current operators, how it would
>> interact with other parts of the system etc.
>>
>> These are the thing we need to figure out before we consider the
>> optimizations I think, because it usually turns out, that the API
>> semantics
>> you would like to provide can hugely affect (probably limit) the
>> possibilities that you have afterwards in terms of optimizations.
>>
>> Let me know if you have further questions regarding this :)
>>
>> Gyula
>>
>> On Tue, Mar 24, 2015 at 12:01 PM, Gyula Fóra <gy...@apache.org> wrote:
>>
>> > Hey,
>> >
>> > Give me an hour or so as I am in a meeting currently, but I will get
>> back
>> > to you afterwards.
>> >
>> > Regards,
>> > Gyula
>> >
>> > On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <ak...@gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >> It'd really help if I got a reply soon. It'll be helpful in writing the
>> >> proposal since the deadline is on 27th. Thanks
>> >> Regards,
>> >> Akshay Dixit
>> >>
>> >> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <ak...@gmail.com>
>> >> wrote:
>> >>
>> >> > Thanks for the explanation Marton. I've decided to try out for
>> >> FLINK-1534.
>> >> >
>> >> > After reading through the thesis[4] and a few other papers[1][2][3],
>> I
>> >> > believe I've gathered a little context to ask more questions. But I'm
>> >> still
>> >> > not sure how Flink's internals work
>> >> > so please bear with me. Although the ongoing effort to document the
>> >> > architecture and internal is really helpful for newbies like me and
>> >> would
>> >> > greatly decrease the ramping up time.
>> >> >
>> >> > Detecting a pattern of events would comprise of a pipeline that
>> accepts
>> >> > the pattern query and
>> >> > sources of DataStreams, and outputs detected matches of that pattern
>> to
>> >> a
>> >> > sink or forwards it
>> >> > along to another stream for further computation.
>> >> >
>> >> > As you said, a simple filter-join-aggregate query system could be
>> >> > developed implementing using the existing Streaming windowing API.
>> >> > But matching over complex events and decoding their pattern queries
>> >> would
>> >> > require implementing a DSL that transforms queries into an evaluation
>> >> > model. For e.g,
>> >> > in [1], the authors have implemented an NFA automaton with a shared
>> >> > versioned buffer that models the queries. In [4], the authors
>> >> > propose a new language that is much more expressive and compiles
>> into a
>> >> > topology graph for Storm.
>> >> >
>> >> > So in Flink's case, I believe the proposed DSL would generate
>> operator
>> >> > graphs for the Flink compiler to schedule Jobgraphs over
>> TaskManagers.
>> >> > If we don't depend on the Windowing API, would we need to create new
>> >> > operators such as the Projection, Conjunction and Union operators
>> >> defined
>> >> > in [4] ?
>> >> > Also I would like to hear your thoughts on how to approach scaling
>> the
>> >> > pattern matching query. Note all these techniques talk about scaling
>> a
>> >> > single query.
>> >> > I've read various ways such as
>> >> >
>> >> > 1.  Merging equivalent runs[1] -: This seems a good way to squash
>> >> multiple
>> >> > instances of pattern matching forks into a single one if they have
>> the
>> >> same
>> >> > state.
>> >> > But I'm not sure how we would implement this in Flink since this is a
>> >> > runtime optimization.
>> >> >
>> >> > 2.  Implementing a matched version buffer[1] -: This would involve
>> >> sharing
>> >> > state of a buffer datastructure across multiple candidate match
>> >> instances
>> >> > for the pattern.
>> >> >
>> >> > 3.  Splitting complex composite patterns into simpler sub-patterns[4]
>> >> and
>> >> > executing separate queries to detect those sub-patterns. This might
>> >> > translate into different
>> >> > tasks and duplicating the source datastreams to all the new generated
>> >> > tasks.
>> >> >
>> >> > Also since I don't know how the Flink compiler behaves, would some of
>> >> the
>> >> > optimizations involve making changes to it too?
>> >> >
>> >> > Regards,
>> >> > Akshay Dixit
>> >> >
>> >> > [1] : Efficient Pattern Matching over Event Streams
>> >> > <
>> http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf
>> >> >
>> >> > [2] : On Supporting Kleene Closure over Event Streams
>> >> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf>
>> >> > [3] : Processing Flows of Information: From Data Stream to Complex
>> Event
>> >> > Processing
>> >> > <
>> >>
>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf
>> >> >
>> >> > [4] : Distributing Complex Event Detection
>> >> > <
>> >>
>> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf>
>> >> >
>> >> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <
>> >> balassi.marton@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Dear Akshay,
>> >> >>
>> >> >> Thanks again for your interest and for the recent contribution to
>> >> >> streaming.
>> >> >>
>> >> >> Both of the projects mentioned wold be largely appreciated by the
>> >> >> community, and you can also propose other project suggestions here
>> for
>> >> >> discussion.
>> >> >>
>> >> >> Regarding FLINK-1534, the thesis I mentioned serves as a starting
>> point
>> >> >> and
>> >> >> indeed the basic solution can be implemented with filtering and
>> >> >> windowing/mapping with some state storing whether the cause of an
>> event
>> >> >> has
>> >> >> been already seen. Solely relying on the now existing windowing API
>> >> this
>> >> >> however might cause performance issues if the events also have an
>> >> >> expiration timeout - some optimization there would be included. The
>> >> >> further
>> >> >> challenge is to try to further exploit the parallel job execution of
>> >> Flink
>> >> >> to possibly scale a pattern matching query.
>> >> >>
>> >> >> Best,
>> >> >>
>> >> >> Marton
>> >> >>
>> >> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <akshaydixi@gmail.com
>> >
>> >> >> wrote:
>> >> >>
>> >> >> > Hi,
>> >> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India.
>> I'm
>> >> >> > currently interested in distributed systems and stream processing
>> >> and am
>> >> >> > looking to delve deeper into the subject, and hope to get some
>> >> insight
>> >> >> by
>> >> >> > contributing to Apache Flink. I've gathered some idea of the
>> >> >> > flink-streaming codebase by recently working on a PR for
>> >> FLINK-1450[2].
>> >> >> >
>> >> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects
>> that I
>> >> >> would
>> >> >> > love to work on over the summer. I was wondering which amongst
>> these
>> >> >> would
>> >> >> > be more appreciated by the community, so I can start working
>> towards
>> >> a
>> >> >> > proposal for either one.
>> >> >> >
>> >> >> > Regarding FLINK-1534, I was wondering why would simply merging and
>> >> >> > filtering the existing streams for events we want to detect not
>> work?
>> >> >> Also
>> >> >> > on going through the document mentioned by @mbalassi in the JIRA
>> >> >> > comment[5], the authors specify some Runtime Event Detection
>> >> concepts in
>> >> >> > Section 5.2. I'm assuming the project entails on building a
>> similar
>> >> >> analogy
>> >> >> > using Flink and the deliverables would include working pattern
>> >> matching
>> >> >> > operators over Flink DataStreams as described in the report. If
>> so,
>> >> then
>> >> >> > shouldn't it be trivial to implement the described the Binary
>> >> operator
>> >> >> > using a WindowedStream and a Filter?
>> >> >> > I hope my questions don't seem misplaced here and I would
>> appreciate
>> >> >> links
>> >> >> > to literature where I can learn more on the topic.
>> >> >> >
>> >> >> > Regards,
>> >> >> > Akshay Dixit
>> >> >> >
>> >> >> > [1] : http://akshaydixi.me
>> >> >> > [2] : https://github.com/apache/flink/pull/481
>> >> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
>> >> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
>> >> >> > [5] :
>> >> >> >
>> >>
>> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534

Posted by Akshay Dixit <ak...@gmail.com>.
Thanks Gyula.

I agree too that simple and working implementations are preferrable over
hacky complex solutions. I'll start sketching out an initial straighforward
API with only basic pattern matching features
and base it on the existing windowing API. I'll post a draft of the
proposal,  keeping the points you've said in mind, tomorrow, so you can
look it over to see if its all right.
Regards,
Akshay Dixit

On Tue, Mar 24, 2015 at 6:30 PM, Gyula Fóra <gy...@apache.org> wrote:

> Hey Dixit,
>
> Sorry for the delay, I had to discuss this in more detail with some of our
> other core developers.
>
> The consensus seems to be that we would like push this project in a
> direction where the changes can be quickly included in the next releases.
> For this it is essential that we implement features that are complete (and
> clean) from the users perspective. This does not necessarily mean that we
> would like to have everything at once but rather that it is preferable to
> start with something clean and simple (for instance the naive chained
> filter approach) and progressively build more complex logic.
>
> This also mean that we would like to avoid "researchy" code in the codebase
> as much as possible. Of course once we have a stable api for this
> functionality we can work towards making the optimizations that you have
> mentioned like operator sharing and so on.
>
> The ideal proposal would give a clear sketch of the pattern matching API
> that you would like to implement, which might be some added operators at
> first to the current API and possible a DSL later with more advanced
> functionality (this would probably go in a separate library until it is
> very stable).
>
> So please in the proposal include a preview of what the pattern matching
> syntax would look like integrated with the current operators, how it would
> interact with other parts of the system etc.
>
> These are the thing we need to figure out before we consider the
> optimizations I think, because it usually turns out, that the API semantics
> you would like to provide can hugely affect (probably limit) the
> possibilities that you have afterwards in terms of optimizations.
>
> Let me know if you have further questions regarding this :)
>
> Gyula
>
> On Tue, Mar 24, 2015 at 12:01 PM, Gyula Fóra <gy...@apache.org> wrote:
>
> > Hey,
> >
> > Give me an hour or so as I am in a meeting currently, but I will get back
> > to you afterwards.
> >
> > Regards,
> > Gyula
> >
> > On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <ak...@gmail.com>
> > wrote:
> >
> >> Hi,
> >> It'd really help if I got a reply soon. It'll be helpful in writing the
> >> proposal since the deadline is on 27th. Thanks
> >> Regards,
> >> Akshay Dixit
> >>
> >> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <ak...@gmail.com>
> >> wrote:
> >>
> >> > Thanks for the explanation Marton. I've decided to try out for
> >> FLINK-1534.
> >> >
> >> > After reading through the thesis[4] and a few other papers[1][2][3], I
> >> > believe I've gathered a little context to ask more questions. But I'm
> >> still
> >> > not sure how Flink's internals work
> >> > so please bear with me. Although the ongoing effort to document the
> >> > architecture and internal is really helpful for newbies like me and
> >> would
> >> > greatly decrease the ramping up time.
> >> >
> >> > Detecting a pattern of events would comprise of a pipeline that
> accepts
> >> > the pattern query and
> >> > sources of DataStreams, and outputs detected matches of that pattern
> to
> >> a
> >> > sink or forwards it
> >> > along to another stream for further computation.
> >> >
> >> > As you said, a simple filter-join-aggregate query system could be
> >> > developed implementing using the existing Streaming windowing API.
> >> > But matching over complex events and decoding their pattern queries
> >> would
> >> > require implementing a DSL that transforms queries into an evaluation
> >> > model. For e.g,
> >> > in [1], the authors have implemented an NFA automaton with a shared
> >> > versioned buffer that models the queries. In [4], the authors
> >> > propose a new language that is much more expressive and compiles into
> a
> >> > topology graph for Storm.
> >> >
> >> > So in Flink's case, I believe the proposed DSL would generate operator
> >> > graphs for the Flink compiler to schedule Jobgraphs over TaskManagers.
> >> > If we don't depend on the Windowing API, would we need to create new
> >> > operators such as the Projection, Conjunction and Union operators
> >> defined
> >> > in [4] ?
> >> > Also I would like to hear your thoughts on how to approach scaling the
> >> > pattern matching query. Note all these techniques talk about scaling a
> >> > single query.
> >> > I've read various ways such as
> >> >
> >> > 1.  Merging equivalent runs[1] -: This seems a good way to squash
> >> multiple
> >> > instances of pattern matching forks into a single one if they have the
> >> same
> >> > state.
> >> > But I'm not sure how we would implement this in Flink since this is a
> >> > runtime optimization.
> >> >
> >> > 2.  Implementing a matched version buffer[1] -: This would involve
> >> sharing
> >> > state of a buffer datastructure across multiple candidate match
> >> instances
> >> > for the pattern.
> >> >
> >> > 3.  Splitting complex composite patterns into simpler sub-patterns[4]
> >> and
> >> > executing separate queries to detect those sub-patterns. This might
> >> > translate into different
> >> > tasks and duplicating the source datastreams to all the new generated
> >> > tasks.
> >> >
> >> > Also since I don't know how the Flink compiler behaves, would some of
> >> the
> >> > optimizations involve making changes to it too?
> >> >
> >> > Regards,
> >> > Akshay Dixit
> >> >
> >> > [1] : Efficient Pattern Matching over Event Streams
> >> > <
> http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf
> >> >
> >> > [2] : On Supporting Kleene Closure over Event Streams
> >> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf>
> >> > [3] : Processing Flows of Information: From Data Stream to Complex
> Event
> >> > Processing
> >> > <
> >>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf
> >> >
> >> > [4] : Distributing Complex Event Detection
> >> > <
> >> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >
> >> >
> >> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <
> >> balassi.marton@gmail.com>
> >> > wrote:
> >> >
> >> >> Dear Akshay,
> >> >>
> >> >> Thanks again for your interest and for the recent contribution to
> >> >> streaming.
> >> >>
> >> >> Both of the projects mentioned wold be largely appreciated by the
> >> >> community, and you can also propose other project suggestions here
> for
> >> >> discussion.
> >> >>
> >> >> Regarding FLINK-1534, the thesis I mentioned serves as a starting
> point
> >> >> and
> >> >> indeed the basic solution can be implemented with filtering and
> >> >> windowing/mapping with some state storing whether the cause of an
> event
> >> >> has
> >> >> been already seen. Solely relying on the now existing windowing API
> >> this
> >> >> however might cause performance issues if the events also have an
> >> >> expiration timeout - some optimization there would be included. The
> >> >> further
> >> >> challenge is to try to further exploit the parallel job execution of
> >> Flink
> >> >> to possibly scale a pattern matching query.
> >> >>
> >> >> Best,
> >> >>
> >> >> Marton
> >> >>
> >> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <ak...@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > Hi,
> >> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India.
> I'm
> >> >> > currently interested in distributed systems and stream processing
> >> and am
> >> >> > looking to delve deeper into the subject, and hope to get some
> >> insight
> >> >> by
> >> >> > contributing to Apache Flink. I've gathered some idea of the
> >> >> > flink-streaming codebase by recently working on a PR for
> >> FLINK-1450[2].
> >> >> >
> >> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that
> I
> >> >> would
> >> >> > love to work on over the summer. I was wondering which amongst
> these
> >> >> would
> >> >> > be more appreciated by the community, so I can start working
> towards
> >> a
> >> >> > proposal for either one.
> >> >> >
> >> >> > Regarding FLINK-1534, I was wondering why would simply merging and
> >> >> > filtering the existing streams for events we want to detect not
> work?
> >> >> Also
> >> >> > on going through the document mentioned by @mbalassi in the JIRA
> >> >> > comment[5], the authors specify some Runtime Event Detection
> >> concepts in
> >> >> > Section 5.2. I'm assuming the project entails on building a similar
> >> >> analogy
> >> >> > using Flink and the deliverables would include working pattern
> >> matching
> >> >> > operators over Flink DataStreams as described in the report. If so,
> >> then
> >> >> > shouldn't it be trivial to implement the described the Binary
> >> operator
> >> >> > using a WindowedStream and a Filter?
> >> >> > I hope my questions don't seem misplaced here and I would
> appreciate
> >> >> links
> >> >> > to literature where I can learn more on the topic.
> >> >> >
> >> >> > Regards,
> >> >> > Akshay Dixit
> >> >> >
> >> >> > [1] : http://akshaydixi.me
> >> >> > [2] : https://github.com/apache/flink/pull/481
> >> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
> >> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
> >> >> > [5] :
> >> >> >
> >> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534

Posted by Gyula Fóra <gy...@apache.org>.
Hey Dixit,

Sorry for the delay, I had to discuss this in more detail with some of our
other core developers.

The consensus seems to be that we would like push this project in a
direction where the changes can be quickly included in the next releases.
For this it is essential that we implement features that are complete (and
clean) from the users perspective. This does not necessarily mean that we
would like to have everything at once but rather that it is preferable to
start with something clean and simple (for instance the naive chained
filter approach) and progressively build more complex logic.

This also mean that we would like to avoid "researchy" code in the codebase
as much as possible. Of course once we have a stable api for this
functionality we can work towards making the optimizations that you have
mentioned like operator sharing and so on.

The ideal proposal would give a clear sketch of the pattern matching API
that you would like to implement, which might be some added operators at
first to the current API and possible a DSL later with more advanced
functionality (this would probably go in a separate library until it is
very stable).

So please in the proposal include a preview of what the pattern matching
syntax would look like integrated with the current operators, how it would
interact with other parts of the system etc.

These are the thing we need to figure out before we consider the
optimizations I think, because it usually turns out, that the API semantics
you would like to provide can hugely affect (probably limit) the
possibilities that you have afterwards in terms of optimizations.

Let me know if you have further questions regarding this :)

Gyula

On Tue, Mar 24, 2015 at 12:01 PM, Gyula Fóra <gy...@apache.org> wrote:

> Hey,
>
> Give me an hour or so as I am in a meeting currently, but I will get back
> to you afterwards.
>
> Regards,
> Gyula
>
> On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <ak...@gmail.com>
> wrote:
>
>> Hi,
>> It'd really help if I got a reply soon. It'll be helpful in writing the
>> proposal since the deadline is on 27th. Thanks
>> Regards,
>> Akshay Dixit
>>
>> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <ak...@gmail.com>
>> wrote:
>>
>> > Thanks for the explanation Marton. I've decided to try out for
>> FLINK-1534.
>> >
>> > After reading through the thesis[4] and a few other papers[1][2][3], I
>> > believe I've gathered a little context to ask more questions. But I'm
>> still
>> > not sure how Flink's internals work
>> > so please bear with me. Although the ongoing effort to document the
>> > architecture and internal is really helpful for newbies like me and
>> would
>> > greatly decrease the ramping up time.
>> >
>> > Detecting a pattern of events would comprise of a pipeline that accepts
>> > the pattern query and
>> > sources of DataStreams, and outputs detected matches of that pattern to
>> a
>> > sink or forwards it
>> > along to another stream for further computation.
>> >
>> > As you said, a simple filter-join-aggregate query system could be
>> > developed implementing using the existing Streaming windowing API.
>> > But matching over complex events and decoding their pattern queries
>> would
>> > require implementing a DSL that transforms queries into an evaluation
>> > model. For e.g,
>> > in [1], the authors have implemented an NFA automaton with a shared
>> > versioned buffer that models the queries. In [4], the authors
>> > propose a new language that is much more expressive and compiles into a
>> > topology graph for Storm.
>> >
>> > So in Flink's case, I believe the proposed DSL would generate operator
>> > graphs for the Flink compiler to schedule Jobgraphs over TaskManagers.
>> > If we don't depend on the Windowing API, would we need to create new
>> > operators such as the Projection, Conjunction and Union operators
>> defined
>> > in [4] ?
>> > Also I would like to hear your thoughts on how to approach scaling the
>> > pattern matching query. Note all these techniques talk about scaling a
>> > single query.
>> > I've read various ways such as
>> >
>> > 1.  Merging equivalent runs[1] -: This seems a good way to squash
>> multiple
>> > instances of pattern matching forks into a single one if they have the
>> same
>> > state.
>> > But I'm not sure how we would implement this in Flink since this is a
>> > runtime optimization.
>> >
>> > 2.  Implementing a matched version buffer[1] -: This would involve
>> sharing
>> > state of a buffer datastructure across multiple candidate match
>> instances
>> > for the pattern.
>> >
>> > 3.  Splitting complex composite patterns into simpler sub-patterns[4]
>> and
>> > executing separate queries to detect those sub-patterns. This might
>> > translate into different
>> > tasks and duplicating the source datastreams to all the new generated
>> > tasks.
>> >
>> > Also since I don't know how the Flink compiler behaves, would some of
>> the
>> > optimizations involve making changes to it too?
>> >
>> > Regards,
>> > Akshay Dixit
>> >
>> > [1] : Efficient Pattern Matching over Event Streams
>> > <http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf
>> >
>> > [2] : On Supporting Kleene Closure over Event Streams
>> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf>
>> > [3] : Processing Flows of Information: From Data Stream to Complex Event
>> > Processing
>> > <
>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf
>> >
>> > [4] : Distributing Complex Event Detection
>> > <
>> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf>
>> >
>> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <
>> balassi.marton@gmail.com>
>> > wrote:
>> >
>> >> Dear Akshay,
>> >>
>> >> Thanks again for your interest and for the recent contribution to
>> >> streaming.
>> >>
>> >> Both of the projects mentioned wold be largely appreciated by the
>> >> community, and you can also propose other project suggestions here for
>> >> discussion.
>> >>
>> >> Regarding FLINK-1534, the thesis I mentioned serves as a starting point
>> >> and
>> >> indeed the basic solution can be implemented with filtering and
>> >> windowing/mapping with some state storing whether the cause of an event
>> >> has
>> >> been already seen. Solely relying on the now existing windowing API
>> this
>> >> however might cause performance issues if the events also have an
>> >> expiration timeout - some optimization there would be included. The
>> >> further
>> >> challenge is to try to further exploit the parallel job execution of
>> Flink
>> >> to possibly scale a pattern matching query.
>> >>
>> >> Best,
>> >>
>> >> Marton
>> >>
>> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <ak...@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi,
>> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India. I'm
>> >> > currently interested in distributed systems and stream processing
>> and am
>> >> > looking to delve deeper into the subject, and hope to get some
>> insight
>> >> by
>> >> > contributing to Apache Flink. I've gathered some idea of the
>> >> > flink-streaming codebase by recently working on a PR for
>> FLINK-1450[2].
>> >> >
>> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that I
>> >> would
>> >> > love to work on over the summer. I was wondering which amongst these
>> >> would
>> >> > be more appreciated by the community, so I can start working towards
>> a
>> >> > proposal for either one.
>> >> >
>> >> > Regarding FLINK-1534, I was wondering why would simply merging and
>> >> > filtering the existing streams for events we want to detect not work?
>> >> Also
>> >> > on going through the document mentioned by @mbalassi in the JIRA
>> >> > comment[5], the authors specify some Runtime Event Detection
>> concepts in
>> >> > Section 5.2. I'm assuming the project entails on building a similar
>> >> analogy
>> >> > using Flink and the deliverables would include working pattern
>> matching
>> >> > operators over Flink DataStreams as described in the report. If so,
>> then
>> >> > shouldn't it be trivial to implement the described the Binary
>> operator
>> >> > using a WindowedStream and a Filter?
>> >> > I hope my questions don't seem misplaced here and I would appreciate
>> >> links
>> >> > to literature where I can learn more on the topic.
>> >> >
>> >> > Regards,
>> >> > Akshay Dixit
>> >> >
>> >> > [1] : http://akshaydixi.me
>> >> > [2] : https://github.com/apache/flink/pull/481
>> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
>> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
>> >> > [5] :
>> >> >
>> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534

Posted by Gyula Fóra <gy...@apache.org>.
Hey,

Give me an hour or so as I am in a meeting currently, but I will get back
to you afterwards.

Regards,
Gyula

On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <ak...@gmail.com> wrote:

> Hi,
> It'd really help if I got a reply soon. It'll be helpful in writing the
> proposal since the deadline is on 27th. Thanks
> Regards,
> Akshay Dixit
>
> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <ak...@gmail.com>
> wrote:
>
> > Thanks for the explanation Marton. I've decided to try out for
> FLINK-1534.
> >
> > After reading through the thesis[4] and a few other papers[1][2][3], I
> > believe I've gathered a little context to ask more questions. But I'm
> still
> > not sure how Flink's internals work
> > so please bear with me. Although the ongoing effort to document the
> > architecture and internal is really helpful for newbies like me and would
> > greatly decrease the ramping up time.
> >
> > Detecting a pattern of events would comprise of a pipeline that accepts
> > the pattern query and
> > sources of DataStreams, and outputs detected matches of that pattern to a
> > sink or forwards it
> > along to another stream for further computation.
> >
> > As you said, a simple filter-join-aggregate query system could be
> > developed implementing using the existing Streaming windowing API.
> > But matching over complex events and decoding their pattern queries would
> > require implementing a DSL that transforms queries into an evaluation
> > model. For e.g,
> > in [1], the authors have implemented an NFA automaton with a shared
> > versioned buffer that models the queries. In [4], the authors
> > propose a new language that is much more expressive and compiles into a
> > topology graph for Storm.
> >
> > So in Flink's case, I believe the proposed DSL would generate operator
> > graphs for the Flink compiler to schedule Jobgraphs over TaskManagers.
> > If we don't depend on the Windowing API, would we need to create new
> > operators such as the Projection, Conjunction and Union operators defined
> > in [4] ?
> > Also I would like to hear your thoughts on how to approach scaling the
> > pattern matching query. Note all these techniques talk about scaling a
> > single query.
> > I've read various ways such as
> >
> > 1.  Merging equivalent runs[1] -: This seems a good way to squash
> multiple
> > instances of pattern matching forks into a single one if they have the
> same
> > state.
> > But I'm not sure how we would implement this in Flink since this is a
> > runtime optimization.
> >
> > 2.  Implementing a matched version buffer[1] -: This would involve
> sharing
> > state of a buffer datastructure across multiple candidate match instances
> > for the pattern.
> >
> > 3.  Splitting complex composite patterns into simpler sub-patterns[4] and
> > executing separate queries to detect those sub-patterns. This might
> > translate into different
> > tasks and duplicating the source datastreams to all the new generated
> > tasks.
> >
> > Also since I don't know how the Flink compiler behaves, would some of the
> > optimizations involve making changes to it too?
> >
> > Regards,
> > Akshay Dixit
> >
> > [1] : Efficient Pattern Matching over Event Streams
> > <http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf>
> > [2] : On Supporting Kleene Closure over Event Streams
> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf>
> > [3] : Processing Flows of Information: From Data Stream to Complex Event
> > Processing
> > <
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf
> >
> > [4] : Distributing Complex Event Detection
> > <http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >
> >
> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <
> balassi.marton@gmail.com>
> > wrote:
> >
> >> Dear Akshay,
> >>
> >> Thanks again for your interest and for the recent contribution to
> >> streaming.
> >>
> >> Both of the projects mentioned wold be largely appreciated by the
> >> community, and you can also propose other project suggestions here for
> >> discussion.
> >>
> >> Regarding FLINK-1534, the thesis I mentioned serves as a starting point
> >> and
> >> indeed the basic solution can be implemented with filtering and
> >> windowing/mapping with some state storing whether the cause of an event
> >> has
> >> been already seen. Solely relying on the now existing windowing API this
> >> however might cause performance issues if the events also have an
> >> expiration timeout - some optimization there would be included. The
> >> further
> >> challenge is to try to further exploit the parallel job execution of
> Flink
> >> to possibly scale a pattern matching query.
> >>
> >> Best,
> >>
> >> Marton
> >>
> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <ak...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India. I'm
> >> > currently interested in distributed systems and stream processing and
> am
> >> > looking to delve deeper into the subject, and hope to get some insight
> >> by
> >> > contributing to Apache Flink. I've gathered some idea of the
> >> > flink-streaming codebase by recently working on a PR for
> FLINK-1450[2].
> >> >
> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that I
> >> would
> >> > love to work on over the summer. I was wondering which amongst these
> >> would
> >> > be more appreciated by the community, so I can start working towards a
> >> > proposal for either one.
> >> >
> >> > Regarding FLINK-1534, I was wondering why would simply merging and
> >> > filtering the existing streams for events we want to detect not work?
> >> Also
> >> > on going through the document mentioned by @mbalassi in the JIRA
> >> > comment[5], the authors specify some Runtime Event Detection concepts
> in
> >> > Section 5.2. I'm assuming the project entails on building a similar
> >> analogy
> >> > using Flink and the deliverables would include working pattern
> matching
> >> > operators over Flink DataStreams as described in the report. If so,
> then
> >> > shouldn't it be trivial to implement the described the Binary operator
> >> > using a WindowedStream and a Filter?
> >> > I hope my questions don't seem misplaced here and I would appreciate
> >> links
> >> > to literature where I can learn more on the topic.
> >> >
> >> > Regards,
> >> > Akshay Dixit
> >> >
> >> > [1] : http://akshaydixi.me
> >> > [2] : https://github.com/apache/flink/pull/481
> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
> >> > [5] :
> >> >
> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >> >
> >>
> >
> >
>

Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534

Posted by Akshay Dixit <ak...@gmail.com>.
Hi,
It'd really help if I got a reply soon. It'll be helpful in writing the
proposal since the deadline is on 27th. Thanks
Regards,
Akshay Dixit

On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <ak...@gmail.com> wrote:

> Thanks for the explanation Marton. I've decided to try out for FLINK-1534.
>
> After reading through the thesis[4] and a few other papers[1][2][3], I
> believe I've gathered a little context to ask more questions. But I'm still
> not sure how Flink's internals work
> so please bear with me. Although the ongoing effort to document the
> architecture and internal is really helpful for newbies like me and would
> greatly decrease the ramping up time.
>
> Detecting a pattern of events would comprise of a pipeline that accepts
> the pattern query and
> sources of DataStreams, and outputs detected matches of that pattern to a
> sink or forwards it
> along to another stream for further computation.
>
> As you said, a simple filter-join-aggregate query system could be
> developed implementing using the existing Streaming windowing API.
> But matching over complex events and decoding their pattern queries would
> require implementing a DSL that transforms queries into an evaluation
> model. For e.g,
> in [1], the authors have implemented an NFA automaton with a shared
> versioned buffer that models the queries. In [4], the authors
> propose a new language that is much more expressive and compiles into a
> topology graph for Storm.
>
> So in Flink's case, I believe the proposed DSL would generate operator
> graphs for the Flink compiler to schedule Jobgraphs over TaskManagers.
> If we don't depend on the Windowing API, would we need to create new
> operators such as the Projection, Conjunction and Union operators defined
> in [4] ?
> Also I would like to hear your thoughts on how to approach scaling the
> pattern matching query. Note all these techniques talk about scaling a
> single query.
> I've read various ways such as
>
> 1.  Merging equivalent runs[1] -: This seems a good way to squash multiple
> instances of pattern matching forks into a single one if they have the same
> state.
> But I'm not sure how we would implement this in Flink since this is a
> runtime optimization.
>
> 2.  Implementing a matched version buffer[1] -: This would involve sharing
> state of a buffer datastructure across multiple candidate match instances
> for the pattern.
>
> 3.  Splitting complex composite patterns into simpler sub-patterns[4] and
> executing separate queries to detect those sub-patterns. This might
> translate into different
> tasks and duplicating the source datastreams to all the new generated
> tasks.
>
> Also since I don't know how the Flink compiler behaves, would some of the
> optimizations involve making changes to it too?
>
> Regards,
> Akshay Dixit
>
> [1] : Efficient Pattern Matching over Event Streams
> <http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf>
> [2] : On Supporting Kleene Closure over Event Streams
> <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf>
> [3] : Processing Flows of Information: From Data Stream to Complex Event
> Processing
> <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf>
> [4] : Distributing Complex Event Detection
> <http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf>
>
> On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <ba...@gmail.com>
> wrote:
>
>> Dear Akshay,
>>
>> Thanks again for your interest and for the recent contribution to
>> streaming.
>>
>> Both of the projects mentioned wold be largely appreciated by the
>> community, and you can also propose other project suggestions here for
>> discussion.
>>
>> Regarding FLINK-1534, the thesis I mentioned serves as a starting point
>> and
>> indeed the basic solution can be implemented with filtering and
>> windowing/mapping with some state storing whether the cause of an event
>> has
>> been already seen. Solely relying on the now existing windowing API this
>> however might cause performance issues if the events also have an
>> expiration timeout - some optimization there would be included. The
>> further
>> challenge is to try to further exploit the parallel job execution of Flink
>> to possibly scale a pattern matching query.
>>
>> Best,
>>
>> Marton
>>
>> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <ak...@gmail.com>
>> wrote:
>>
>> > Hi,
>> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India. I'm
>> > currently interested in distributed systems and stream processing and am
>> > looking to delve deeper into the subject, and hope to get some insight
>> by
>> > contributing to Apache Flink. I've gathered some idea of the
>> > flink-streaming codebase by recently working on a PR for FLINK-1450[2].
>> >
>> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that I
>> would
>> > love to work on over the summer. I was wondering which amongst these
>> would
>> > be more appreciated by the community, so I can start working towards a
>> > proposal for either one.
>> >
>> > Regarding FLINK-1534, I was wondering why would simply merging and
>> > filtering the existing streams for events we want to detect not work?
>> Also
>> > on going through the document mentioned by @mbalassi in the JIRA
>> > comment[5], the authors specify some Runtime Event Detection concepts in
>> > Section 5.2. I'm assuming the project entails on building a similar
>> analogy
>> > using Flink and the deliverables would include working pattern matching
>> > operators over Flink DataStreams as described in the report. If so, then
>> > shouldn't it be trivial to implement the described the Binary operator
>> > using a WindowedStream and a Filter?
>> > I hope my questions don't seem misplaced here and I would appreciate
>> links
>> > to literature where I can learn more on the topic.
>> >
>> > Regards,
>> > Akshay Dixit
>> >
>> > [1] : http://akshaydixi.me
>> > [2] : https://github.com/apache/flink/pull/481
>> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
>> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
>> > [5] :
>> > http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
>> >
>>
>
>

Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534

Posted by Akshay Dixit <ak...@gmail.com>.
Thanks for the explanation Marton. I've decided to try out for FLINK-1534.

After reading through the thesis[4] and a few other papers[1][2][3], I
believe I've gathered a little context to ask more questions. But I'm still
not sure how Flink's internals work
so please bear with me. Although the ongoing effort to document the
architecture and internal is really helpful for newbies like me and would
greatly decrease the ramping up time.

Detecting a pattern of events would comprise of a pipeline that accepts the
pattern query and
sources of DataStreams, and outputs detected matches of that pattern to a
sink or forwards it
along to another stream for further computation.

As you said, a simple filter-join-aggregate query system could be developed
implementing using the existing Streaming windowing API.
But matching over complex events and decoding their pattern queries would
require implementing a DSL that transforms queries into an evaluation
model. For e.g,
in [1], the authors have implemented an NFA automaton with a shared
versioned buffer that models the queries. In [4], the authors
propose a new language that is much more expressive and compiles into a
topology graph for Storm.

So in Flink's case, I believe the proposed DSL would generate operator
graphs for the Flink compiler to schedule Jobgraphs over TaskManagers.
If we don't depend on the Windowing API, would we need to create new
operators such as the Projection, Conjunction and Union operators defined
in [4] ?
Also I would like to hear your thoughts on how to approach scaling the
pattern matching query. Note all these techniques talk about scaling a
single query.
I've read various ways such as

1.  Merging equivalent runs[1] -: This seems a good way to squash multiple
instances of pattern matching forks into a single one if they have the same
state.
But I'm not sure how we would implement this in Flink since this is a
runtime optimization.

2.  Implementing a matched version buffer[1] -: This would involve sharing
state of a buffer datastructure across multiple candidate match instances
for the pattern.

3.  Splitting complex composite patterns into simpler sub-patterns[4] and
executing separate queries to detect those sub-patterns. This might
translate into different
tasks and duplicating the source datastreams to all the new generated tasks.

Also since I don't know how the Flink compiler behaves, would some of the
optimizations involve making changes to it too?

Regards,
Akshay Dixit

[1] : Efficient Pattern Matching over Event Streams
<http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf>
[2] : On Supporting Kleene Closure over Event Streams
<http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf>
[3] : Processing Flows of Information: From Data Stream to Complex Event
Processing
<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf>
[4] : Distributing Complex Event Detection
<http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf>

On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <ba...@gmail.com>
wrote:

> Dear Akshay,
>
> Thanks again for your interest and for the recent contribution to
> streaming.
>
> Both of the projects mentioned wold be largely appreciated by the
> community, and you can also propose other project suggestions here for
> discussion.
>
> Regarding FLINK-1534, the thesis I mentioned serves as a starting point and
> indeed the basic solution can be implemented with filtering and
> windowing/mapping with some state storing whether the cause of an event has
> been already seen. Solely relying on the now existing windowing API this
> however might cause performance issues if the events also have an
> expiration timeout - some optimization there would be included. The further
> challenge is to try to further exploit the parallel job execution of Flink
> to possibly scale a pattern matching query.
>
> Best,
>
> Marton
>
> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <ak...@gmail.com>
> wrote:
>
> > Hi,
> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India. I'm
> > currently interested in distributed systems and stream processing and am
> > looking to delve deeper into the subject, and hope to get some insight by
> > contributing to Apache Flink. I've gathered some idea of the
> > flink-streaming codebase by recently working on a PR for FLINK-1450[2].
> >
> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that I
> would
> > love to work on over the summer. I was wondering which amongst these
> would
> > be more appreciated by the community, so I can start working towards a
> > proposal for either one.
> >
> > Regarding FLINK-1534, I was wondering why would simply merging and
> > filtering the existing streams for events we want to detect not work?
> Also
> > on going through the document mentioned by @mbalassi in the JIRA
> > comment[5], the authors specify some Runtime Event Detection concepts in
> > Section 5.2. I'm assuming the project entails on building a similar
> analogy
> > using Flink and the deliverables would include working pattern matching
> > operators over Flink DataStreams as described in the report. If so, then
> > shouldn't it be trivial to implement the described the Binary operator
> > using a WindowedStream and a Filter?
> > I hope my questions don't seem misplaced here and I would appreciate
> links
> > to literature where I can learn more on the topic.
> >
> > Regards,
> > Akshay Dixit
> >
> > [1] : http://akshaydixi.me
> > [2] : https://github.com/apache/flink/pull/481
> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
> > [5] :
> > http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
> >
>

Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534

Posted by Márton Balassi <ba...@gmail.com>.
Dear Akshay,

Thanks again for your interest and for the recent contribution to streaming.

Both of the projects mentioned wold be largely appreciated by the
community, and you can also propose other project suggestions here for
discussion.

Regarding FLINK-1534, the thesis I mentioned serves as a starting point and
indeed the basic solution can be implemented with filtering and
windowing/mapping with some state storing whether the cause of an event has
been already seen. Solely relying on the now existing windowing API this
however might cause performance issues if the events also have an
expiration timeout - some optimization there would be included. The further
challenge is to try to further exploit the parallel job execution of Flink
to possibly scale a pattern matching query.

Best,

Marton

On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <ak...@gmail.com> wrote:

> Hi,
> I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India. I'm
> currently interested in distributed systems and stream processing and am
> looking to delve deeper into the subject, and hope to get some insight by
> contributing to Apache Flink. I've gathered some idea of the
> flink-streaming codebase by recently working on a PR for FLINK-1450[2].
>
> Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that I would
> love to work on over the summer. I was wondering which amongst these would
> be more appreciated by the community, so I can start working towards a
> proposal for either one.
>
> Regarding FLINK-1534, I was wondering why would simply merging and
> filtering the existing streams for events we want to detect not work? Also
> on going through the document mentioned by @mbalassi in the JIRA
> comment[5], the authors specify some Runtime Event Detection concepts in
> Section 5.2. I'm assuming the project entails on building a similar analogy
> using Flink and the deliverables would include working pattern matching
> operators over Flink DataStreams as described in the report. If so, then
> shouldn't it be trivial to implement the described the Binary operator
> using a WindowedStream and a Filter?
> I hope my questions don't seem misplaced here and I would appreciate links
> to literature where I can learn more on the topic.
>
> Regards,
> Akshay Dixit
>
> [1] : http://akshaydixi.me
> [2] : https://github.com/apache/flink/pull/481
> [3] : https://issues.apache.org/jira/browse/FLINK-1617
> [4] : https://issues.apache.org/jira/browse/FLINK-1534
> [5] :
> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
>