You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Vsevolod Ostapenko (Jira)" <ji...@apache.org> on 2020/01/15 22:24:00 UTC

[jira] [Created] (KYLIN-4341) by-level cuboid intermediate files are left behind and not cleaned up after job is complete

Vsevolod Ostapenko created KYLIN-4341:
-----------------------------------------

             Summary: by-level cuboid intermediate files are left behind and not cleaned up after job is complete
                 Key: KYLIN-4341
                 URL: https://issues.apache.org/jira/browse/KYLIN-4341
             Project: Kylin
          Issue Type: Bug
          Components: Job Engine
    Affects Versions: v2.6.4
         Environment: Kylin 2.6.4, CenOS 7.6, HDP 2.6.5
            Reporter: Vsevolod Ostapenko


Setup: MR as a cube build engine and by-level cube build strategy (auto picked).
Upon completion of a cube segment build job a number of intermediate files are still left behind.
Namely, output of the MR-jobs that produce the base cuboid, subsequent level cuboids, as well as rowkey_stats from the hfile creation step.
The files in question consume about the same amount of space in HDFS as the final hfile.
This lead to wasted space in HDFS that is not released for as long as the corresponding cube segment is online. The only point the leaked space is released, is when segment is offlined and cleaned up as part of the segment retention.

Sample output is as follows.
{quote}$ hadoop fs -ls -R /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/
drwxr-xr-x - kylin hdfs 0 2020-01-07 04:44 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89
drwxr-xr-x - kylin hdfs 0 2020-01-07 04:26 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid
drwxr-xr-x - kylin hdfs 0 2020-01-07 04:36 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid
-rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:36 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/_SUCCESS
-rw-r--r-- 2 kylin hdfs 51570048 2020-01-07 04:35 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-00000
-rw-r--r-- 2 kylin hdfs 51477377 2020-01-07 04:36 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-00001
-rw-r--r-- 2 kylin hdfs 51615162 2020-01-07 04:35 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-00002
-rw-r--r-- 2 kylin hdfs 51591031 2020-01-07 04:36 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-00003
-rw-r--r-- 2 kylin hdfs 51648914 2020-01-07 04:35 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-00004
-rw-r--r-- 2 kylin hdfs 51532761 2020-01-07 04:36 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-00005
-rw-r--r-- 2 kylin hdfs 51455652 2020-01-07 04:35 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-00006
-rw-r--r-- 2 kylin hdfs 51552752 2020-01-07 04:36 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-00007
drwxr-xr-x - kylin hdfs 0 2020-01-07 04:25 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid
-rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:25 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/_SUCCESS
-rw-r--r-- 2 kylin hdfs 16293012 2020-01-07 04:25 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-00000
-rw-r--r-- 2 kylin hdfs 16283730 2020-01-07 04:25 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-00001
-rw-r--r-- 2 kylin hdfs 16288965 2020-01-07 04:25 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-00002
-rw-r--r-- 2 kylin hdfs 16270572 2020-01-07 04:25 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-00003
drwxr-xr-x - kylin hdfs 0 2020-01-07 04:23 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/rowkey_stats
-rw-r--r-- 3 kylin hdfs 155 2020-01-07 04:23 /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/rowkey_stats/part-r-00000_hfile
{quote}
 

Removing the job metadata using (metastore.sh clean --jobThreshold Ndays) does not help. Information about the job is removed, but no intermediate files are cleaned up.

Storage cleanup does not work either (kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete true), because the corresponding segment is still online.

 

It looks like cleanup of the intermediate files that generated by MR job as part of the base and level cuboid builds is either not implemented or commented out for whatever reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)