You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Shaofeng SHI (JIRA)" <ji...@apache.org> on 2015/09/06 08:38:45 UTC
[jira] [Assigned] (KYLIN-998) Finish the hive intermediate table
clean up job in org.apache.kylin.job.hadoop.cube.StorageCleanupJob
[ https://issues.apache.org/jira/browse/KYLIN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shaofeng SHI reassigned KYLIN-998:
----------------------------------
Assignee: Shaofeng SHI (was: hongbin ma)
> Finish the hive intermediate table clean up job in org.apache.kylin.job.hadoop.cube.StorageCleanupJob
> -----------------------------------------------------------------------------------------------------
>
> Key: KYLIN-998
> URL: https://issues.apache.org/jira/browse/KYLIN-998
> Project: Kylin
> Issue Type: Improvement
> Components: Storage - HBase
> Affects Versions: v0.7.2, v0.7.1
> Reporter: nichunen
> Assignee: Shaofeng SHI
> Fix For: v1.1
>
> Attachments: patch1.zip
>
>
> Current kylin has its last cube building job step named “Garbage Collection” to remove the intermediate data in hdfs/hbase/hive. But if the job is accidentally stopped like problem in hadoop cluster, bad cube design, discarded by user, the data was left un-deleted.
> In such cases, we can run "hbase org.apache.hadoop.util.RunJar $KYLIN_HOME/lib/kylin-job-0.8.1-incubating-SNAPSHOT.jar org.apache.kylin.job.hadoop.cube.StorageCleanupJob --delete true" to remove the data. But the method "cleanUnusedIntermediateHiveTable" is unfinished.
> My first patch is to finish the method, it will remove unused hive tables with names begin with "kylin_intermediate_".
> My second patch add some methods to enable deleting unused data with uuids in command line, or stored in a file.
> I don't know whether the second patch is useful to you, it's used in our kylin server to remove data after one cube is deleted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)