You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Felipe Gutierrez <fe...@gmail.com> on 2018/12/19 14:14:08 UTC

Question about Flink optimizer on Stream API

Hi,

I was reading some FLIP documents related to the new design of the Flink
Schedule [1] and unification of batch and stream [2]. Then I created two
different programs to learn how Flink optimizes the Query Plan in Batch and
in Stream mode (and how much further it goes). One using batch [3] and one
using Stream [4]. During the code debugging and also as it is depicted on
the document [2], the batch program uses the
org.apache.flink.optimizer.Optimizer class which generates a
"org.apache.flink.optimizer.plan.OptimizedPlan" while stream program uses
the "org.apache.flink.streaming.api.graph.StreamGraph" and every
transformation inside the packet
"org.apache.flink.streaming.api.transformations".

When I am showing the execution plan with "env.getExecutionPlan()" I see
exactly I have written on the Flink program (which it is expected).
However, I was looking for where I can see the optimized plan. I mean
decisions of operators reordering based on cost or statistics. For batch I
could find the "org.apache.flink.optimizer.costs.CostEstimator" and
"org.apache.flink.optimizer.DataStatistics". But for Stream I only found
the creation of the plan. How can I debug that? Or have a better
understanding of what Flink is doing. Do you advise me to read some other
reference about this?

Kind Regards,
Felipe

[1] Group-aware scheduling for Flink -
https://docs.google.com/document/d/1q7NOqt05HIN-PlKEEPB36JiuU1Iu9fnxxVGJzylhsxU/edit#heading=h.k15nfgsa5bnk
[2] Unified Core API for Streaming and Batch -
https://docs.google.com/document/d/1G0NUIaaNJvT6CMrNCP6dRXGv88xNhDQqZFrQEuJ0rVU/edit#
[3]
https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/batch/MatrixMultiplication.java
[4]
https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/stream/SensorsReadingMqttJoinQEP.java

*--*
*-- Felipe Gutierrez*

*-- skype: felipe.o.gutierrez*
*--* *https://felipeogutierrez.blogspot.com
<https://felipeogutierrez.blogspot.com>*

Re: Question about Flink optimizer on Stream API

Posted by Till Rohrmann <tr...@apache.org>.
Hi Felipe,

for streaming Flink currently does not optimize the data flow graph. I
think the best reference is actually going through the code as you've done
for the batch case.

Cheers,
Till

On Wed, Dec 19, 2018 at 3:14 PM Felipe Gutierrez <
felipe.o.gutierrez@gmail.com> wrote:

> Hi,
>
> I was reading some FLIP documents related to the new design of the Flink
> Schedule [1] and unification of batch and stream [2]. Then I created two
> different programs to learn how Flink optimizes the Query Plan in Batch and
> in Stream mode (and how much further it goes). One using batch [3] and one
> using Stream [4]. During the code debugging and also as it is depicted on
> the document [2], the batch program uses the
> org.apache.flink.optimizer.Optimizer class which generates a
> "org.apache.flink.optimizer.plan.OptimizedPlan" while stream program uses
> the "org.apache.flink.streaming.api.graph.StreamGraph" and every
> transformation inside the packet
> "org.apache.flink.streaming.api.transformations".
>
> When I am showing the execution plan with "env.getExecutionPlan()" I see
> exactly I have written on the Flink program (which it is expected).
> However, I was looking for where I can see the optimized plan. I mean
> decisions of operators reordering based on cost or statistics. For batch I
> could find the "org.apache.flink.optimizer.costs.CostEstimator" and
> "org.apache.flink.optimizer.DataStatistics". But for Stream I only found
> the creation of the plan. How can I debug that? Or have a better
> understanding of what Flink is doing. Do you advise me to read some other
> reference about this?
>
> Kind Regards,
> Felipe
>
> [1] Group-aware scheduling for Flink -
> https://docs.google.com/document/d/1q7NOqt05HIN-PlKEEPB36JiuU1Iu9fnxxVGJzylhsxU/edit#heading=h.k15nfgsa5bnk
> [2] Unified Core API for Streaming and Batch -
> https://docs.google.com/document/d/1G0NUIaaNJvT6CMrNCP6dRXGv88xNhDQqZFrQEuJ0rVU/edit#
> [3]
> https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/batch/MatrixMultiplication.java
> [4]
> https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/stream/SensorsReadingMqttJoinQEP.java
>
> *--*
> *-- Felipe Gutierrez*
>
> *-- skype: felipe.o.gutierrez*
> *--* *https://felipeogutierrez.blogspot.com
> <https://felipeogutierrez.blogspot.com>*
>

Re: Question about Flink optimizer on Stream API

Posted by Till Rohrmann <tr...@apache.org>.
Hi Felipe,

for streaming Flink currently does not optimize the data flow graph. I
think the best reference is actually going through the code as you've done
for the batch case.

Cheers,
Till

On Wed, Dec 19, 2018 at 3:14 PM Felipe Gutierrez <
felipe.o.gutierrez@gmail.com> wrote:

> Hi,
>
> I was reading some FLIP documents related to the new design of the Flink
> Schedule [1] and unification of batch and stream [2]. Then I created two
> different programs to learn how Flink optimizes the Query Plan in Batch and
> in Stream mode (and how much further it goes). One using batch [3] and one
> using Stream [4]. During the code debugging and also as it is depicted on
> the document [2], the batch program uses the
> org.apache.flink.optimizer.Optimizer class which generates a
> "org.apache.flink.optimizer.plan.OptimizedPlan" while stream program uses
> the "org.apache.flink.streaming.api.graph.StreamGraph" and every
> transformation inside the packet
> "org.apache.flink.streaming.api.transformations".
>
> When I am showing the execution plan with "env.getExecutionPlan()" I see
> exactly I have written on the Flink program (which it is expected).
> However, I was looking for where I can see the optimized plan. I mean
> decisions of operators reordering based on cost or statistics. For batch I
> could find the "org.apache.flink.optimizer.costs.CostEstimator" and
> "org.apache.flink.optimizer.DataStatistics". But for Stream I only found
> the creation of the plan. How can I debug that? Or have a better
> understanding of what Flink is doing. Do you advise me to read some other
> reference about this?
>
> Kind Regards,
> Felipe
>
> [1] Group-aware scheduling for Flink -
> https://docs.google.com/document/d/1q7NOqt05HIN-PlKEEPB36JiuU1Iu9fnxxVGJzylhsxU/edit#heading=h.k15nfgsa5bnk
> [2] Unified Core API for Streaming and Batch -
> https://docs.google.com/document/d/1G0NUIaaNJvT6CMrNCP6dRXGv88xNhDQqZFrQEuJ0rVU/edit#
> [3]
> https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/batch/MatrixMultiplication.java
> [4]
> https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/stream/SensorsReadingMqttJoinQEP.java
>
> *--*
> *-- Felipe Gutierrez*
>
> *-- skype: felipe.o.gutierrez*
> *--* *https://felipeogutierrez.blogspot.com
> <https://felipeogutierrez.blogspot.com>*
>