You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "Indhumathi (Jira)" <ji...@apache.org> on 2022/02/14 15:59:00 UTC
[jira] [Resolved] (CARBONDATA-4322) Insert into local sort partition table select * from text table launch thousands tasks

     [ https://issues.apache.org/jira/browse/CARBONDATA-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Indhumathi resolved CARBONDATA-4322.
------------------------------------
    Fix Version/s: 2.3.0
       Resolution: Fixed

> Insert into local sort partition table select * from text table launch thousands tasks
> --------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-4322
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4322
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: SHREELEKHYA GAMPA
>            Priority: Major
>             Fix For: 2.3.0
>
>          Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> [Reproduce steps]
>  # CREATE TABLE partitionthree1 (empno int, doj Timestamp, workgroupcategoryname String, deptno int, deptname String, projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int, utilization int,salary int, empname String, designation String) PARTITIONED BY (workgroupcategory int) STORED AS carbondata tblproperties('sort_scope'='local_sort', 'sort_columns'='deptname,empname');
>  # CREATE TABLE partitionthree2 (empno int, doj Timestamp, workgroupcategoryname String, deptno int, deptname String, projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int, utilization int,salary int, empname String, designation String) PARTITIONED BY (workgroupcategory int);
>  # LOAD DATA local inpath 'hdfs://hacluster/user/data.csv' INTO TABLE partitionthree1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"', 'TIMESTAMPFORMAT'='dd-MM-yyyy');
>  # set hive.exec.dynamic.partition.mode=nonstrict;
>  # insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
>  # insert into partitionthree1 select * from partitionthree2;
>  
> [Expect Result]
> Step 6 only launches number of tasks equal to number of nodes.
>  
> [Current Behavior]
> Number of tasks far larger than number of nodes.
>  
> [Impact]
> In several product sites, query performance get impact significantly.
>  
> [Initial analysis]
> Insert into non partition local sort table will launch number of tasks equal to number of nodes, make partition table the same.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)