You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Laurens VIJNCK <la...@uhasselt.be> on 2019/12/14 14:52:26 UTC

Worst-case optimal join processing on Streams

Dear folks,

DISCLAIMER: With this mail, my sole intention is to establish contact with the community and trade ideas on how to realize the goal described below.

I'm a starting PhD researcher in distributed systems and databases who is particularly interested in worst-case optimal (multiway) join processing on streams. I have performed preliminary tests with a new join algorithm that shows rather promising results. However, the limitation is that the algorithm operates in a centralized fashion. My goal is to extend the capabilities of the algorithm to operate in a distributed environment. To showcase my results, I want to implement a proof-of-concept in Apache Flink. I know this is a rather ambitious project, hence why I am reaching out to the community.

I have traversed most of the application development documentation on the website (e.g., [1, 2, 3, 4]) but I am now eager the learn more about the internals thereof. Specifically, I want to gain some more insights in the lifecycle of a query in Flink. Is there some additional documentation available on this subject?

Thanks in advance.

[1] https://flink.apache.org/news/2015/04/13/release-0.9.0-milestone1.html
[2] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/dynamic_tables.html
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/streaming/joins.html
[4] https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals

Kind regards,

Laurens Vijnck

Re: Worst-case optimal join processing on Streams

Posted by Kurt Young <yk...@gmail.com>.

Hi Laurens,

Good to hear that you are interested with optimizing Flink's join strategy.
If you want
to learn more about the lifecycle of a query in Flink, I would
recommend you to read
the original design doc of Flink Table&SQL module [1], hope it helps.

Best,
Kurt

[1]
https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI/


On Sat, Dec 14, 2019 at 10:52 PM Laurens VIJNCK <la...@uhasselt.be>
wrote:

> Dear folks,
>
> DISCLAIMER: With this mail, my sole intention is to establish contact with
> the community and trade ideas on how to realize the goal described below.
>
> I'm a starting PhD researcher in distributed systems and databases who is
> particularly interested in worst-case optimal (multiway) join processing on
> streams. I have performed preliminary tests with a new join algorithm that
> shows rather promising results. However, the limitation is that the
> algorithm operates in a centralized fashion. My goal is to extend the
> capabilities of the algorithm to operate in a distributed environment. To
> showcase my results, I want to implement a proof-of-concept in Apache
> Flink. I know this is a rather ambitious project, hence why I am reaching
> out to the community.
>
> I have traversed most of the application development documentation on the
> website (e.g., [1, 2, 3, 4]) but I am now eager the learn more about the
> internals thereof. Specifically, I want to gain some more insights in the
> lifecycle of a query in Flink. Is there some additional documentation
> available on this subject?
>
> Thanks in advance.
>
> [1] https://flink.apache.org/news/2015/04/13/release-0.9.0-milestone1.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/dynamic_tables.html
> [3]
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/streaming/joins.html
> [4] https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals
>
> Kind regards,
>
> Laurens Vijnck
>