You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "Jun Zhang (Jira)" <ji...@apache.org> on 2020/03/27 02:18:00 UTC

[jira] [Created] (FLINK-16818) Optimize data skew when flink write data to hive dynamic partition table

Jun Zhang created FLINK-16818:
---------------------------------

             Summary: Optimize data skew when flink write data to hive dynamic partition table
                 Key: FLINK-16818
                 URL: https://issues.apache.org/jira/browse/FLINK-16818
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / Hive
    Affects Versions: 1.10.0
         Environment: {code:java}
 {code}
            Reporter: Jun Zhang
             Fix For: 1.11.0


I read the source table data of hive through flink sql, and then write the target table of hive. The target table is a partitioned table. When the data of a partition is particularly large, data skew occurs, resulting in a particularly long execution time.

By default Configuration, the same sql, hive on spark takes five minutes, and flink takes about 40 minutes.

example:

 
{code:java}
// the schema of myparttable

name string,
age int,
PARTITIONED BY ( 
type string, 
day string
)

INSERT OVERWRITE myparttable SELECT name, age, type,day from sourcetable;
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)