You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Fabian Hueske <fh...@gmail.com> on 2015/03/24 14:48:51 UTC

Re: [jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

Hi Timo, cool stuff!

I agree with Stephan. A separate repository is not necessary because this
feature is opaque to users (except for the activation switch) and might
therefore be added to flink-core, IMO.

The handling of forwarded fields for group-wise operators in the optimizer
is not fully sorted out, yet.
So that might need to be adapted (see FLINK-1656, and PR #525)

For the switch we could offer three options:
- deactivated
- activated hinting (write extracted semantic information to log)
- activated optimizing (use extracted semantic info in optimizer)

Regarding additional checks we could:
- detect whether a Filter function modifies the record
- check if a Reduce function returns a new record or the first(?) input
record.

2015-03-24 13:07 GMT+01:00 Maximilian Michels (JIRA) <ji...@apache.org>:

>
>     [
> https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377761#comment-14377761
> ]
>
> Maximilian Michels commented on FLINK-1319:
> -------------------------------------------
>
> This looks like a very promising way to automatically optimize Flink jobs.
>
> +1 for including it in {{flink-staging}}.
> +1 for a switch in the {{ExecutionEnvironment}} to manually turn it on.
>
> > Add static code analysis for UDFs
> > ---------------------------------
> >
> >                 Key: FLINK-1319
> >                 URL: https://issues.apache.org/jira/browse/FLINK-1319
> >             Project: Flink
> >          Issue Type: New Feature
> >          Components: Java API, Scala API
> >            Reporter: Stephan Ewen
> >            Assignee: Timo Walther
> >            Priority: Minor
> >
> > Flink's Optimizer takes information that tells it for UDFs which fields
> of the input elements are accessed, modified, or frwarded/copied. This
> information frequently helps to reuse partitionings, sorts, etc. It may
> speed up programs significantly, as it can frequently eliminate sorts and
> shuffles, which are costly.
> > Right now, users can add lightweight annotations to UDFs to provide this
> information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}.
> > We worked with static code analysis of UDFs before, to determine this
> information automatically. This is an incredible feature, as it "magically"
> makes programs faster.
> > For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross),
> this works surprisingly well in many cases. We used the "Soot" toolkit for
> the static code analysis. Unfortunately, Soot is LGPL licensed and thus we
> did not include any of the code so far.
> > I propose to add this functionality to Flink, in the form of a drop-in
> addition, to work around the LGPL incompatibility with ALS 2.0. Users could
> simply download a special "flink-code-analysis.jar" and drop it into the
> "lib" folder to enable this functionality. We may even add a script to
> "tools" that downloads that library automatically into the lib folder. This
> should be legally fine, since we do not redistribute LGPL code and only
> dynamically link it (the incompatibility with ASL 2.0 is mainly in the
> patentability, if I remember correctly).
> > Prior work on this has been done by [~aljoscha] and [~skunert], which
> could provide a code base to start with.
> > *Appendix*
> > Hompage to Soot static analysis toolkit:
> http://www.sable.mcgill.ca/soot/
> > Papers on static analysis and for optimization:
> http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf
> and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
> > Quick introduction to the Optimizer:
> http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf
> (Section 6)
> > Optimizer for Iterations:
> http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf
> (Sections 4.3 and 5.3)
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>