You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Iñigo Martinez (JIRA)" <ji...@apache.org> on 2018/09/11 10:44:00 UTC

[jira] [Commented] (KYLIN-3555) Garbage collection on HBase step fails with S3 selected as storage

    [ https://issues.apache.org/jira/browse/KYLIN-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610423#comment-16610423 ] 

Iñigo Martinez commented on KYLIN-3555:
---------------------------------------

This problem is NOT present on 2.4.0 or at least does not raise an exception.
{code:java}
2018-09-11 12:39:06,197 DEBUG [Scheduler 2125314127 Job f8416975-eea6-4500-9cb7-4374f28451dc-125] steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: s3://XXXXXXXX-emr-kylin
2018-09-11 12:39:06,248 DEBUG [Scheduler 2125314127 Job f8416975-eea6-4500-9cb7-4374f28451dc-125] steps.HDFSPathGarbageCollectionStep:90 : HDFS path /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns not exists.
2018-09-11 12:39:06,390 DEBUG [Scheduler 2125314127 Job f8416975-eea6-4500-9cb7-4374f28451dc-125] steps.HDFSPathGarbageCollectionStep:90 : HDFS path /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile not exists.
2018-09-11 12:39:06,500 DEBUG [Scheduler 2125314127 Job f8416975-eea6-4500-9cb7-4374f28451dc-125] steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: s3://XXXXXXXX-emr-kylin
2018-09-11 12:39:06,505 DEBUG [Scheduler 2125314127 Job f8416975-eea6-4500-9cb7-4374f28451dc-125] steps.HDFSPathGarbageCollectionStep:90 : HDFS path /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns not exists.
2018-09-11 12:39:06,552 DEBUG [Scheduler 2125314127 Job f8416975-eea6-4500-9cb7-4374f28451dc-125] steps.HDFSPathGarbageCollectionStep:90 : HDFS path /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile not exists.
2018-09-11 12:39:06,652 INFO [Scheduler 2125314127 Job f8416975-eea6-4500-9cb7-4374f28451dc-125] execution.ExecutableManager:411 : job id:f8416975-eea6-4500-9cb7-4374f28451dc-15 from RUNNING to SUCCEED
2018-09-11 12:39:06,695 INFO [Scheduler 2125314127 Job f8416975-eea6-4500-9cb7-4374f28451dc-125] execution.ExecutableManager:411 : job id:f8416975-eea6-4500-9cb7-4374f28451dc from RUNNING to SUCCEED
2018-09-11 12:39:06,696 DEBUG [Scheduler 2125314127 Job f8416975-eea6-4500-9cb7-4374f28451dc-125] execution.AbstractExecutable:310 : no need to send email, user list is empty
{code}

> Garbage collection on HBase step fails with S3 selected as storage
> ------------------------------------------------------------------
>
>                 Key: KYLIN-3555
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3555
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: v2.4.1
>            Reporter: Iñigo Martinez
>            Priority: Major
>              Labels: build
>         Attachments: Screenshot from 2018-09-11 12-31-25.png
>
>
> When building a cube with S3 selected has storage, build process fails at latest step.
> Although s3 has been defined as storage, cleanup task tries to delete from HDFS and, of course, there is no file at HDFS.
>  
> {code:java}
> 2018-09-11 12:27:56,311 DEBUG [Scheduler 1407846257 Job f8416975-eea6-4500-9cb7-4374f28451dc-237] steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: s3://XXXXXXX-emr-kylin
> 2018-09-11 12:27:57,364 DEBUG [Scheduler 1407846257 Job f8416975-eea6-4500-9cb7-4374f28451dc-237] steps.HDFSPathGarbageCollectionStep:87 : HDFS path /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns is dropped.
> 2018-09-11 12:27:58,104 DEBUG [Scheduler 1407846257 Job f8416975-eea6-4500-9cb7-4374f28451dc-237] steps.HDFSPathGarbageCollectionStep:87 : HDFS path /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile is dropped.
> 2018-09-11 12:27:58,140 DEBUG [Scheduler 1407846257 Job f8416975-eea6-4500-9cb7-4374f28451dc-237] steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: hdfs://ip-10-0-1-63.eu-west-1.compute.internal:8020
> 2018-09-11 12:27:58,142 DEBUG [Scheduler 1407846257 Job f8416975-eea6-4500-9cb7-4374f28451dc-237] steps.HDFSPathGarbageCollectionStep:90 : HDFS path /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns not exists.
> 2018-09-11 12:27:58,147 ERROR [Scheduler 1407846257 Job f8416975-eea6-4500-9cb7-4374f28451dc-237] steps.HDFSPathGarbageCollectionStep:68 : job:f8416975-eea6-4500-9cb7-4374f28451dc-15 execute finished with exception
> java.io.FileNotFoundException: File /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1 does not exist.
> at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904)
> at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
> at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964)
> at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971)
> at org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.dropHdfsPathOnCluster(HDFSPathGarbageCollectionStep.java:95)
> at org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.doWork(HDFSPathGarbageCollectionStep.java:65)
> at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:69)
> at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)