You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chunhui Shi (JIRA)" <ji...@apache.org> on 2019/02/14 01:18:00 UTC

[jira] [Updated] (FLINK-11421) Add compilation options to allow compiling generated code with JDK compiler

     [ https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chunhui Shi updated FLINK-11421:
--------------------------------
    Summary: Add compilation options to allow compiling generated code with JDK compiler   (was: Providing more compilation options for code-generated operators)

> Add compilation options to allow compiling generated code with JDK compiler 
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-11421
>                 URL: https://issues.apache.org/jira/browse/FLINK-11421
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API &amp; SQL
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Major
>              Labels: pull-request-available
>   Original Estimate: 240h
>          Time Spent: 10m
>  Remaining Estimate: 239h 50m
>
> Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code generation. That is, Flink generates their source code dynamically, and then compile it into Java Byte Code, which is load and executed at runtime.
>  
> By default, Flink compiles the generated source code by Janino. This is fast, as the compilation often finishes in hundreds of milliseconds. The generated Java Byte Code, however, is of poor quality. To illustrate, we use Java Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) queries show that the E2E time can be more than 10% shorter, when operators are compiled by JCA, despite that it takes more time (a few seconds) to compile with JCA.
>  
> Therefore, we believe it is beneficial to compile generated code by JCA in the following scenarios: 1) For batch jobs, the E2E time is relatively long, so it is worth of spending more time compiling and generating high quality Java Byte Code. 2) For repeated stream jobs, the generated code will be compiled once and run many times. Therefore, it pays to spend more time compiling for the first time, and enjoy the high byte code qualities for later runs.
>  
> According to the above observations, we want to provide a compilation option (Janino, JCA, or dynamic) for Flink, so that the user can choose the one suitable for their specific scenario and obtain better performance whenever possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)