You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ZhangBaoquan (Jira)" <ji...@apache.org> on 2021/01/11 10:08:00 UTC

[jira] [Comment Edited] (FLINK-20898) Code of BatchExpand & LocalNoGroupingAggregateWithoutKeys grows beyond 64 KB

    [ https://issues.apache.org/jira/browse/FLINK-20898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262534#comment-17262534 ] 

ZhangBaoquan edited comment on FLINK-20898 at 1/11/21, 10:07 AM:
-----------------------------------------------------------------

With version 1.11.2&1.11.3&1.12.0,I have a similar problem. When I run a complex SQL just like :

 
{code:java}
select a,b,c,
sum(some_col),
....(A lot of aggregation operations)
group by a,b,c
{code}
 

in batchMode with blink planner ,it will throw exception like:[^exception.log]
{code:java}
Caused by: org.codehaus.janino.InternalCompilerException: Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$3329" grows beyond 64 KB{code}
However, it performs normally under streamingMode


was (Author: fifth):
I have a similar problem. When I run a complex SQL just like :

 
{code:java}
select a,b,c,
sum(some_col),
....(A lot of aggregation operations)
group by a,b,c
{code}
 

in batchMode ,it will throw exception like:[^exception.log]
{code:java}
Caused by: org.codehaus.janino.InternalCompilerException: Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$3329" grows beyond 64 KB{code}
However, it performs normally under streamingMode

> Code of BatchExpand & LocalNoGroupingAggregateWithoutKeys grows beyond 64 KB 
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-20898
>                 URL: https://issues.apache.org/jira/browse/FLINK-20898
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>            Reporter: Sebastian Liu
>            Priority: Major
>         Attachments: exception.log
>
>
> When we write a complex batch aggregation SQL, the generated code can easily exceed the 64KB size limitation for BatchExpand and LocalNoGroupingAggregateWithoutKeys operator. Especially for the analyze table scenario. 
> For a simple sql of
> {code:java}
> analyze table tpc_ds.call_center compute statistics for all columns{code}
> the underlying sql to execute will be:
> {code:java}
> SELECT CAST(COUNT(1) AS BIGINT),
>     CAST(COUNT(DISTINCT `cc_call_center_sk`) AS BIGINT),
>     CAST(
>         (COUNT(1) - COUNT(`cc_call_center_sk`)) AS BIGINT
>     ),
>     CAST(8.0 AS DOUBLE),
>     CAST(8.0 AS INTEGER),
>     CAST(MAX(`cc_call_center_sk`) AS BIGINT),
>     CAST(MIN(`cc_call_center_sk`) AS BIGINT),
>     CAST(COUNT(DISTINCT `cc_call_center_id`) AS BIGINT),
>     CAST(
>         (COUNT(1) - COUNT(`cc_call_center_id`)) AS BIGINT
>     ),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_call_center_id`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_call_center_id`)) AS INTEGER),
>     CAST(MAX(`cc_call_center_id`) AS VARCHAR),
>     CAST(MIN(`cc_call_center_id`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_rec_start_date`) AS BIGINT),
>     CAST(
>         (COUNT(1) - COUNT(`cc_rec_start_date`)) AS BIGINT
>     ),
>     CAST(12.0 AS DOUBLE),
>     CAST(12.0 AS INTEGER),
>     CAST(MAX(`cc_rec_start_date`) AS DATE),
>     CAST(MIN(`cc_rec_start_date`) AS DATE),
>     CAST(COUNT(DISTINCT `cc_rec_end_date`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_rec_end_date`)) AS BIGINT),
>     CAST(12.0 AS DOUBLE),
>     CAST(12.0 AS INTEGER),
>     CAST(MAX(`cc_rec_end_date`) AS DATE),
>     CAST(MIN(`cc_rec_end_date`) AS DATE),
>     CAST(COUNT(DISTINCT `cc_closed_date_sk`) AS BIGINT),
>     CAST(
>         (COUNT(1) - COUNT(`cc_closed_date_sk`)) AS BIGINT
>     ),
>     CAST(8.0 AS DOUBLE),
>     CAST(8.0 AS INTEGER),
>     CAST(MAX(`cc_closed_date_sk`) AS BIGINT),
>     CAST(MIN(`cc_closed_date_sk`) AS BIGINT),
>     CAST(COUNT(DISTINCT `cc_open_date_sk`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_open_date_sk`)) AS BIGINT),
>     CAST(8.0 AS DOUBLE),
>     CAST(8.0 AS INTEGER),
>     CAST(MAX(`cc_open_date_sk`) AS BIGINT),
>     CAST(MIN(`cc_open_date_sk`) AS BIGINT),
>     CAST(COUNT(DISTINCT `cc_name`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_name`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_name`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_name`)) AS INTEGER),
>     CAST(MAX(`cc_name`) AS VARCHAR),
>     CAST(MIN(`cc_name`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_class`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_class`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_class`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_class`)) AS INTEGER),
>     CAST(MAX(`cc_class`) AS VARCHAR),
>     CAST(MIN(`cc_class`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_employees`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_employees`)) AS BIGINT),
>     CAST(4.0 AS DOUBLE),
>     CAST(4.0 AS INTEGER),
>     CAST(MAX(`cc_employees`) AS INTEGER),
>     CAST(MIN(`cc_employees`) AS INTEGER),
>     CAST(COUNT(DISTINCT `cc_sq_ft`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_sq_ft`)) AS BIGINT),
>     CAST(4.0 AS DOUBLE),
>     CAST(4.0 AS INTEGER),
>     CAST(MAX(`cc_sq_ft`) AS INTEGER),
>     CAST(MIN(`cc_sq_ft`) AS INTEGER),
>     CAST(COUNT(DISTINCT `cc_hours`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_hours`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_hours`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_hours`)) AS INTEGER),
>     CAST(MAX(`cc_hours`) AS VARCHAR),
>     CAST(MIN(`cc_hours`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_manager`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_manager`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_manager`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_manager`)) AS INTEGER),
>     CAST(MAX(`cc_manager`) AS VARCHAR),
>     CAST(MIN(`cc_manager`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_mkt_id`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_mkt_id`)) AS BIGINT),
>     CAST(4.0 AS DOUBLE),
>     CAST(4.0 AS INTEGER),
>     CAST(MAX(`cc_mkt_id`) AS INTEGER),
>     CAST(MIN(`cc_mkt_id`) AS INTEGER),
>     CAST(COUNT(DISTINCT `cc_mkt_class`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_mkt_class`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_mkt_class`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_mkt_class`)) AS INTEGER),
>     CAST(MAX(`cc_mkt_class`) AS VARCHAR),
>     CAST(MIN(`cc_mkt_class`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_mkt_desc`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_mkt_desc`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_mkt_desc`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_mkt_desc`)) AS INTEGER),
>     CAST(MAX(`cc_mkt_desc`) AS VARCHAR),
>     CAST(MIN(`cc_mkt_desc`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_market_manager`) AS BIGINT),
>     CAST(
>         (COUNT(1) - COUNT(`cc_market_manager`)) AS BIGINT
>     ),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_market_manager`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_market_manager`)) AS INTEGER),
>     CAST(MAX(`cc_market_manager`) AS VARCHAR),
>     CAST(MIN(`cc_market_manager`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_division`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_division`)) AS BIGINT),
>     CAST(4.0 AS DOUBLE),
>     CAST(4.0 AS INTEGER),
>     CAST(MAX(`cc_division`) AS INTEGER),
>     CAST(MIN(`cc_division`) AS INTEGER),
>     CAST(COUNT(DISTINCT `cc_division_name`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_division_name`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_division_name`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_division_name`)) AS INTEGER),
>     CAST(MAX(`cc_division_name`) AS VARCHAR),
>     CAST(MIN(`cc_division_name`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_company`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_company`)) AS BIGINT),
>     CAST(4.0 AS DOUBLE),
>     CAST(4.0 AS INTEGER),
>     CAST(MAX(`cc_company`) AS INTEGER),
>     CAST(MIN(`cc_company`) AS INTEGER),
>     CAST(COUNT(DISTINCT `cc_company_name`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_company_name`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_company_name`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_company_name`)) AS INTEGER),
>     CAST(MAX(`cc_company_name`) AS VARCHAR),
>     CAST(MIN(`cc_company_name`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_street_number`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_street_number`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_street_number`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_street_number`)) AS INTEGER),
>     CAST(MAX(`cc_street_number`) AS VARCHAR),
>     CAST(MIN(`cc_street_number`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_street_name`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_street_name`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_street_name`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_street_name`)) AS INTEGER),
>     CAST(MAX(`cc_street_name`) AS VARCHAR),
>     CAST(MIN(`cc_street_name`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_street_type`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_street_type`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_street_type`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_street_type`)) AS INTEGER),
>     CAST(MAX(`cc_street_type`) AS VARCHAR),
>     CAST(MIN(`cc_street_type`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_suite_number`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_suite_number`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_suite_number`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_suite_number`)) AS INTEGER),
>     CAST(MAX(`cc_suite_number`) AS VARCHAR),
>     CAST(MIN(`cc_suite_number`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_city`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_city`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_city`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_city`)) AS INTEGER),
>     CAST(MAX(`cc_city`) AS VARCHAR),
>     CAST(MIN(`cc_city`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_county`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_county`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_county`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_county`)) AS INTEGER),
>     CAST(MAX(`cc_county`) AS VARCHAR),
>     CAST(MIN(`cc_county`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_state`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_state`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_state`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_state`)) AS INTEGER),
>     CAST(MAX(`cc_state`) AS VARCHAR),
>     CAST(MIN(`cc_state`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_zip`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_zip`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_zip`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_zip`)) AS INTEGER),
>     CAST(MAX(`cc_zip`) AS VARCHAR),
>     CAST(MIN(`cc_zip`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_country`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_country`)) AS BIGINT),
>     CAST(
>         AVG(CAST(CHAR_LENGTH(`cc_country`) AS DOUBLE)) AS DOUBLE
>     ),
>     CAST(MAX(CHAR_LENGTH(`cc_country`)) AS INTEGER),
>     CAST(MAX(`cc_country`) AS VARCHAR),
>     CAST(MIN(`cc_country`) AS VARCHAR),
>     CAST(COUNT(DISTINCT `cc_gmt_offset`) AS BIGINT),
>     CAST((COUNT(1) - COUNT(`cc_gmt_offset`)) AS BIGINT),
>     CAST(8.0 AS DOUBLE),
>     CAST(8.0 AS INTEGER),
>     CAST(MAX(`cc_gmt_offset`) AS DOUBLE),
>     CAST(MIN(`cc_gmt_offset`) AS DOUBLE),
>     CAST(COUNT(DISTINCT `cc_tax_percentage`) AS BIGINT),
>     CAST(
>         (COUNT(1) - COUNT(`cc_tax_percentage`)) AS BIGINT
>     ),
>     CAST(8.0 AS DOUBLE),
>     CAST(8.0 AS INTEGER),
>     CAST(MAX(`cc_tax_percentage`) AS DOUBLE),
>     CAST(MIN(`cc_tax_percentage`) AS DOUBLE)
> FROM `bytedance_hive`.`tpc_ds`.`call_center`
> {code}
>  For this sql, we will get the following root cause exception for compiling code in TM side.
> {code:java}
> Caused by: org.codehaus.janino.InternalCompilerException: Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "BatchExpand$36757" grows beyond 64 KB
> {code}
> and
> {code:java}
> Caused by: org.codehaus.janino.InternalCompilerException: Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "LocalNoGroupingAggregateWithoutKeys$34429" grows beyond 64 KB
> {code}
> We need split the generated code for BatchExpand and LocalNoGroupingAggregateWithoutKeys for complex sql. Just like the ExprCodeGenerator and AggsHandlerCodeGenerator.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)