You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Yaqian Zhang (JIRA)" <ji...@apache.org> on 2019/07/16 02:50:00 UTC

[jira] [Commented] (KYLIN-4000) Merge task hang at step #5: Convert Cuboid Data to HFile

    [ https://issues.apache.org/jira/browse/KYLIN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885792#comment-16885792 ] 

Yaqian Zhang commented on KYLIN-4000:
-------------------------------------

Hi,Tuo Zhu

I tested automatic merging, but it didn't happen as you said.Maybe your resources is not enough.

You can check your environment and try again. If this problem still exists, please contact us and provide more details.

> Merge task hang at step #5: Convert Cuboid Data to HFile
> --------------------------------------------------------
>
>                 Key: KYLIN-4000
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4000
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: v2.6.1
>            Reporter: Tuo Zhu
>            Assignee: Yaqian Zhang
>            Priority: Blocker
>
> Auto triggered merge tasks all hanged at step: Convert Cuboid Data to HFile. The last reduce step in "Kylin_HFile_Generator_mycube_Step".
> Hadoop UI shows job status:
>  attempt_1555603539483_1251_r_000000_0 78.56 RUNNING reduce > reduce
> From the syslog, following outputs are printed at first few hours:
> {code}
>  2019-05-10 22:27:48,699 INFO [IPC Server handler 27 on 46084] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1555603539483_1149_r_000000_0 is : 0.8629564
>  2019-05-10 22:28:24,743 INFO [Socket Reader #1 for port 46084|#1 for port 46084] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1555603539483_1149 (auth:SIMPLE)
>  2019-05-10 22:28:24,746 INFO [IPC Server handler 23 on 46084] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1555603539483_1149_r_000000_0 is : 0.86303306
>  2019-05-10 22:29:00,794 INFO [Socket Reader #1 for port 46084|#1 for port 46084] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1555603539483_1149 (auth:SIMPLE)
>  2019-05-10 22:29:00,796 INFO [IPC Server handler 20 on 46084] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1555603539483_1149_r_000000_0 is : 0.86305153
>  2019-05-10 22:29:36,833 INFO [Socket Reader #1 for port 46084|#1 for port 46084] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1555603539483_1149 (auth:SIMPLE)
>  2019-05-10 22:29:36,836 INFO [IPC Server handler 3 on 46084] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1555603539483_1149_r_000000_0 is : 0.8634469
>  2019-05-10 22:30:12,883 INFO [Socket Reader #1 for port 46084|#1 for port 46084] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1555603539483_1149 (auth:SIMPLE)
>  2019-05-10 22:30:12,887 INFO [IPC Server handler 18 on 46084] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1555603539483_1149_r_000000_0 is : 0.8643631
> {code}
>  After that, only ZK logs keeps printed following lines constantly:
> {code}
> 2019-05-11 00:02:23,445 WARN [Thread-1388778] org.apache.hadoop.hdfs.DataStreamer: These favored nodes were specified but not chosen: [fluorine:16020] Specified favored nodes: [fluorine:16020]
>  2019-05-11 00:02:23,452 INFO [main] org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient: Connect 0x29758ab9 to boron:2181,carbon:2181,neon:2181 with session timeout=90000ms, retries 6, retry interval 1000ms, keepAlive=60000ms
>  2019-05-11 00:02:23,452 INFO [ReadOnlyZKClient-boron:2181,carbon:2181,neon:2181@0x29758ab9] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=bo
>  ron:2181,carbon:2181,neon:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$24/171414890@19fbce98
>  2019-05-11 00:02:23,452 INFO [ReadOnlyZKClient-boron:2181,carbon:2181,neon:2181@0x29758ab9-SendThread(boron:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server boron/172.16.211.20:2181. Will not attempt to authenticate using SASL (unknown error)
>  2019-05-11 00:02:23,453 INFO [ReadOnlyZKClient-boron:2181,carbon:2181,neon:2181@0x29758ab9-SendThread(boron:2181)] org.apache.zookeeper.ClientCnxn: Socket connection established, initiating session, client: /172.16.211.8:52214, server: boron/172.16.211.20:2181
>  2019-05-11 00:02:23,453 INFO [ReadOnlyZKClient-boron:2181,carbon:2181,neon:2181@0x29758ab9-SendThread(boron:2181)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server boron/172.16.211.20:2181, sessionid = 0x16a061f0770a354, negotiated timeout = 60000
>  2019-05-11 00:02:23,455 INFO [main] org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x29758ab9 to boron:2181,carbon:2181,neon:2181
>  2019-05-11 00:02:23,455 INFO [main] org.apache.hadoop.hbase.io.hfile.CacheConfig: Created cacheConfig: CacheConfig:disabled
>  2019-05-11 00:02:23,456 INFO [main] org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2: Writer=hdfs://berylium:8020/kylin/kylin_metadata/kylin-772502e8-c126-fec2-5808-08aa2df1fdea/pbs_uv_daily/hfile/_temporary/1/_temporary/attempt_1555603539483_1149_r_000000_0/F1/fdbf1bd8933242f2bd1da40a03c7a635, wrote=43
>  2019-05-11 00:02:23,457 WARN [Thread-1388781] org.apache.hadoop.hdfs.DataStreamer: These favored nodes were specified but not chosen: [fluorine:16020] Specified favored nodes: [fluorine:16020]
> {code}
> It lasts for tens of hours. No error, no exceptions. In the end, it created too many blocks in HDFS. It only happens when merge task is triggered. Daily build jobs went very good.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)