You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2017/02/22 16:57:44 UTC

[jira] [Commented] (FLINK-5888) ForwardedFields annotation is not generating optimised execution plan in example KMeans job

    [ https://issues.apache.org/jira/browse/FLINK-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878719#comment-15878719 ] 

Fabian Hueske commented on FLINK-5888:
--------------------------------------

Thanks for reporting this issue.

The optimizer optimizes primarily for reduction of network traffic.
The plan with annotations does not add a combiner because it does not need to shuffle data (FORWARD instead of HASH PARTITION ship strategy) between CountAppender and CentroidAccumulator. 
This effect does not become visible because you run the program with a parallelism of 1. You might need to run it on a cluster to see an improvement.

I'm not sure though, why the two mapper are not chained in case of the program with annotations.

> ForwardedFields annotation is not generating optimised execution plan in example KMeans job
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-5888
>                 URL: https://issues.apache.org/jira/browse/FLINK-5888
>             Project: Flink
>          Issue Type: Bug
>          Components: DataSet API, Examples, Java API
>    Affects Versions: 1.1.3
>            Reporter: Ziyad Muhammed Mohiyudheen
>
> Flink KMeans java example [1] shows the usage of ForwardedFields function annotation. How ever, the example job was taking more time than expected on medium sized data itself. By merely removing the function annotation from the example code (with out any other change), a better execution plan and run time was obtained. The execution plan shows that no combiner is used and the two Map tasks are not chained when ForwardedFields is enabled. The experiment is documented in [2]
> [1] https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java
> [2] https://drive.google.com/open?id=0B0IlZv0uHBuvVEZ5ZmNpN19jVVU



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)