You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Sebastian Liu (Jira)" <ji...@apache.org> on 2021/01/08 10:51:00 UTC
[jira] [Created] (FLINK-20898) Code of BatchExpand &
LocalNoGroupingAggregateWithoutKeys grows beyond 64 KB
Sebastian Liu created FLINK-20898:
-------------------------------------
Summary: Code of BatchExpand & LocalNoGroupingAggregateWithoutKeys grows beyond 64 KB
Key: FLINK-20898
URL: https://issues.apache.org/jira/browse/FLINK-20898
Project: Flink
Issue Type: Bug
Components: Table SQL / Planner
Reporter: Sebastian Liu
When we write a complex batch aggregation SQL, the generated code can easily exceed the 64KB size limitation for BatchExpand and LocalNoGroupingAggregateWithoutKeys operator. Especially for the analyze table scenario.
For a simple sql of
{code:java}
analyze table tpc_ds.call_center compute statistics for all columns{code}
the underlying sql to execute will be:
{code:java}
SELECT CAST(COUNT(1) AS BIGINT),
CAST(COUNT(DISTINCT `cc_call_center_sk`) AS BIGINT),
CAST(
(COUNT(1) - COUNT(`cc_call_center_sk`)) AS BIGINT
),
CAST(8.0 AS DOUBLE),
CAST(8.0 AS INTEGER),
CAST(MAX(`cc_call_center_sk`) AS BIGINT),
CAST(MIN(`cc_call_center_sk`) AS BIGINT),
CAST(COUNT(DISTINCT `cc_call_center_id`) AS BIGINT),
CAST(
(COUNT(1) - COUNT(`cc_call_center_id`)) AS BIGINT
),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_call_center_id`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_call_center_id`)) AS INTEGER),
CAST(MAX(`cc_call_center_id`) AS VARCHAR),
CAST(MIN(`cc_call_center_id`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_rec_start_date`) AS BIGINT),
CAST(
(COUNT(1) - COUNT(`cc_rec_start_date`)) AS BIGINT
),
CAST(12.0 AS DOUBLE),
CAST(12.0 AS INTEGER),
CAST(MAX(`cc_rec_start_date`) AS DATE),
CAST(MIN(`cc_rec_start_date`) AS DATE),
CAST(COUNT(DISTINCT `cc_rec_end_date`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_rec_end_date`)) AS BIGINT),
CAST(12.0 AS DOUBLE),
CAST(12.0 AS INTEGER),
CAST(MAX(`cc_rec_end_date`) AS DATE),
CAST(MIN(`cc_rec_end_date`) AS DATE),
CAST(COUNT(DISTINCT `cc_closed_date_sk`) AS BIGINT),
CAST(
(COUNT(1) - COUNT(`cc_closed_date_sk`)) AS BIGINT
),
CAST(8.0 AS DOUBLE),
CAST(8.0 AS INTEGER),
CAST(MAX(`cc_closed_date_sk`) AS BIGINT),
CAST(MIN(`cc_closed_date_sk`) AS BIGINT),
CAST(COUNT(DISTINCT `cc_open_date_sk`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_open_date_sk`)) AS BIGINT),
CAST(8.0 AS DOUBLE),
CAST(8.0 AS INTEGER),
CAST(MAX(`cc_open_date_sk`) AS BIGINT),
CAST(MIN(`cc_open_date_sk`) AS BIGINT),
CAST(COUNT(DISTINCT `cc_name`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_name`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_name`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_name`)) AS INTEGER),
CAST(MAX(`cc_name`) AS VARCHAR),
CAST(MIN(`cc_name`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_class`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_class`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_class`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_class`)) AS INTEGER),
CAST(MAX(`cc_class`) AS VARCHAR),
CAST(MIN(`cc_class`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_employees`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_employees`)) AS BIGINT),
CAST(4.0 AS DOUBLE),
CAST(4.0 AS INTEGER),
CAST(MAX(`cc_employees`) AS INTEGER),
CAST(MIN(`cc_employees`) AS INTEGER),
CAST(COUNT(DISTINCT `cc_sq_ft`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_sq_ft`)) AS BIGINT),
CAST(4.0 AS DOUBLE),
CAST(4.0 AS INTEGER),
CAST(MAX(`cc_sq_ft`) AS INTEGER),
CAST(MIN(`cc_sq_ft`) AS INTEGER),
CAST(COUNT(DISTINCT `cc_hours`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_hours`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_hours`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_hours`)) AS INTEGER),
CAST(MAX(`cc_hours`) AS VARCHAR),
CAST(MIN(`cc_hours`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_manager`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_manager`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_manager`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_manager`)) AS INTEGER),
CAST(MAX(`cc_manager`) AS VARCHAR),
CAST(MIN(`cc_manager`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_mkt_id`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_mkt_id`)) AS BIGINT),
CAST(4.0 AS DOUBLE),
CAST(4.0 AS INTEGER),
CAST(MAX(`cc_mkt_id`) AS INTEGER),
CAST(MIN(`cc_mkt_id`) AS INTEGER),
CAST(COUNT(DISTINCT `cc_mkt_class`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_mkt_class`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_mkt_class`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_mkt_class`)) AS INTEGER),
CAST(MAX(`cc_mkt_class`) AS VARCHAR),
CAST(MIN(`cc_mkt_class`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_mkt_desc`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_mkt_desc`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_mkt_desc`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_mkt_desc`)) AS INTEGER),
CAST(MAX(`cc_mkt_desc`) AS VARCHAR),
CAST(MIN(`cc_mkt_desc`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_market_manager`) AS BIGINT),
CAST(
(COUNT(1) - COUNT(`cc_market_manager`)) AS BIGINT
),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_market_manager`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_market_manager`)) AS INTEGER),
CAST(MAX(`cc_market_manager`) AS VARCHAR),
CAST(MIN(`cc_market_manager`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_division`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_division`)) AS BIGINT),
CAST(4.0 AS DOUBLE),
CAST(4.0 AS INTEGER),
CAST(MAX(`cc_division`) AS INTEGER),
CAST(MIN(`cc_division`) AS INTEGER),
CAST(COUNT(DISTINCT `cc_division_name`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_division_name`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_division_name`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_division_name`)) AS INTEGER),
CAST(MAX(`cc_division_name`) AS VARCHAR),
CAST(MIN(`cc_division_name`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_company`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_company`)) AS BIGINT),
CAST(4.0 AS DOUBLE),
CAST(4.0 AS INTEGER),
CAST(MAX(`cc_company`) AS INTEGER),
CAST(MIN(`cc_company`) AS INTEGER),
CAST(COUNT(DISTINCT `cc_company_name`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_company_name`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_company_name`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_company_name`)) AS INTEGER),
CAST(MAX(`cc_company_name`) AS VARCHAR),
CAST(MIN(`cc_company_name`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_street_number`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_street_number`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_street_number`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_street_number`)) AS INTEGER),
CAST(MAX(`cc_street_number`) AS VARCHAR),
CAST(MIN(`cc_street_number`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_street_name`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_street_name`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_street_name`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_street_name`)) AS INTEGER),
CAST(MAX(`cc_street_name`) AS VARCHAR),
CAST(MIN(`cc_street_name`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_street_type`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_street_type`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_street_type`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_street_type`)) AS INTEGER),
CAST(MAX(`cc_street_type`) AS VARCHAR),
CAST(MIN(`cc_street_type`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_suite_number`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_suite_number`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_suite_number`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_suite_number`)) AS INTEGER),
CAST(MAX(`cc_suite_number`) AS VARCHAR),
CAST(MIN(`cc_suite_number`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_city`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_city`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_city`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_city`)) AS INTEGER),
CAST(MAX(`cc_city`) AS VARCHAR),
CAST(MIN(`cc_city`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_county`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_county`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_county`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_county`)) AS INTEGER),
CAST(MAX(`cc_county`) AS VARCHAR),
CAST(MIN(`cc_county`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_state`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_state`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_state`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_state`)) AS INTEGER),
CAST(MAX(`cc_state`) AS VARCHAR),
CAST(MIN(`cc_state`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_zip`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_zip`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_zip`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_zip`)) AS INTEGER),
CAST(MAX(`cc_zip`) AS VARCHAR),
CAST(MIN(`cc_zip`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_country`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_country`)) AS BIGINT),
CAST(
AVG(CAST(CHAR_LENGTH(`cc_country`) AS DOUBLE)) AS DOUBLE
),
CAST(MAX(CHAR_LENGTH(`cc_country`)) AS INTEGER),
CAST(MAX(`cc_country`) AS VARCHAR),
CAST(MIN(`cc_country`) AS VARCHAR),
CAST(COUNT(DISTINCT `cc_gmt_offset`) AS BIGINT),
CAST((COUNT(1) - COUNT(`cc_gmt_offset`)) AS BIGINT),
CAST(8.0 AS DOUBLE),
CAST(8.0 AS INTEGER),
CAST(MAX(`cc_gmt_offset`) AS DOUBLE),
CAST(MIN(`cc_gmt_offset`) AS DOUBLE),
CAST(COUNT(DISTINCT `cc_tax_percentage`) AS BIGINT),
CAST(
(COUNT(1) - COUNT(`cc_tax_percentage`)) AS BIGINT
),
CAST(8.0 AS DOUBLE),
CAST(8.0 AS INTEGER),
CAST(MAX(`cc_tax_percentage`) AS DOUBLE),
CAST(MIN(`cc_tax_percentage`) AS DOUBLE)
FROM `bytedance_hive`.`tpc_ds`.`call_center`
{code}
For this sql, we will get the following root cause exception for compiling code in TM side.
{code:java}
Caused by: org.codehaus.janino.InternalCompilerException: Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "BatchExpand$36757" grows beyond 64 KB
{code}
and
{code:java}
Caused by: org.codehaus.janino.InternalCompilerException: Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "LocalNoGroupingAggregateWithoutKeys$34429" grows beyond 64 KB
{code}
We need split the generated code for BatchExpand and LocalNoGroupingAggregateWithoutKeys for complex sql. Just like the ExprCodeGenerator and AggsHandlerCodeGenerator.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)