You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Andras Istvan Nagy (Jira)" <ji...@apache.org> on 2020/01/24 16:43:00 UTC

[jira] [Commented] (KYLIN-4348) Fix distributed concurrency lock bug

    [ https://issues.apache.org/jira/browse/KYLIN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023085#comment-17023085 ] 

Andras Istvan Nagy commented on KYLIN-4348:
-------------------------------------------

I was about to file a ticket for an issue but perhaps this one is about the same issue that we have. 

It seems that the patch for the https://issues.apache.org/jira/browse/KYLIN-4165 ticket introduced an issue with distributed locking in our environment. After some time, we get a lot of "STREAM CUBE" jobs stuck at 0%, not making progress, and no jobs in yarn at all, and then the new jobs start piling up as there are already 10 in running state.

At the same time, I see this in the log, which hints at an issue with the locking implementation:
{code:java}
2020-01-20 10:45:46 INFO  MapReduceExecutable:409 - 488ee680-2d37-9c8d-f5bd-82d07df51869-00, parent lock path(/cube_job_lock/cube_jw_v2) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_jw_v2 is locked by other job result is true,will try after one minute
2020-01-20 10:45:46 INFO  MapReduceExecutable:409 - 00358b93-5368-f746-17d9-6a95a8144f73-00, parent lock path(/cube_job_lock/cube_tm_v2) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_tm_v2 is locked by other job result is true,will try after one minute
2020-01-20 10:45:46 INFO  MapReduceExecutable:409 - 452d4c95-c65e-9707-dee7-94be920ba319-00, parent lock path(/cube_job_lock/cube_tm_v2) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_tm_v2 is locked by other job result is true,will try after one minute
2020-01-20 10:45:46 INFO  MapReduceExecutable:409 - 4e73dab2-efe3-5257-5990-46295a4e564d-00, parent lock path(/cube_job_lock/cube_jw_v2) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_jw_v2 is locked by other job result is true,will try after one minute
2020-01-20 10:45:46 INFO  MapReduceExecutable:409 - 232765e7-f171-0e6c-d722-a0d5933e7400-00, parent lock path(/cube_job_lock/cube_tm_v2) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_tm_v2 is locked by other job result is true,will try after one minute
2020-01-20 10:45:46 INFO  MapReduceExecutable:409 - 4c734f3d-bc40-0ce8-3a8e-943a3524a57a-00, parent lock path(/cube_job_lock/cube_jw_v2) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_jw_v2 is locked by other job result is true,will try after one minute
2020-01-20 10:45:55 INFO  MapReduceExecutable:409 - 90d6eda0-a55f-374a-3419-178ef328416a-00, parent lock path(/cube_job_lock/cube_tm_v2) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_tm_v2 is locked by other job result is true,will try after one minute
2020-01-20 10:45:58 INFO  MapReduceExecutable:409 - 582553e4-36b4-54af-23ed-15b36b4154bf-00, parent lock path(/cube_job_lock/cube_jw_v2) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_jw_v2 is locked by other job result is true,will try after one minute
2020-01-20 10:46:02 INFO  MapReduceExecutable:409 - 6d13c89f-f69f-4345-ab28-03afb7a7ac88-00, parent lock path(/cube_job_lock/cube_jw_v2) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_jw_v2 is locked by other job result is true,will try after one minute
{code}
After reverting the patch for the KYLIN-4165, the issue disappeared.

cc [~wangxiaojing] - have you seen the same issue?

> Fix distributed concurrency lock bug
> ------------------------------------
>
>                 Key: KYLIN-4348
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4348
>             Project: Kylin
>          Issue Type: Sub-task
>            Reporter: wangxiaojing
>            Assignee: wangxiaojing
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)