You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Josh Elser (JIRA)" <ji...@apache.org> on 2019/06/17 14:09:00 UTC

[jira] [Updated] (RATIS-591) All create log requests RPCs blocked

     [ https://issues.apache.org/jira/browse/RATIS-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Elser updated RATIS-591:
-----------------------------
    Attachment: master-3.txt
                master-2.txt
                master-1.txt

> All create log requests RPCs blocked
> ------------------------------------
>
>                 Key: RATIS-591
>                 URL: https://issues.apache.org/jira/browse/RATIS-591
>             Project: Ratis
>          Issue Type: Bug
>          Components: LogService
>            Reporter: Josh Elser
>            Assignee: Vladimir Rodionov
>            Priority: Major
>         Attachments: master-1.txt, master-2.txt, master-3.txt
>
>
> I was trying out Rajeshbabu's new changes in RATIS-541 using the docker automation, but gave invalid options the first time which caused the workers to exit (divide by zero).
> When I tried to rerun the VerificationTool, I found that the tool got stuck waiting for logs to be created. Getting a thread dump from the active leader of the metadata quorum showed 150+ threads all stuck waiting to get a write lock. However, there are no threads holding the lock that everyone is waiting on which seems to me like a deadlock.
> It seems like we have some kind of bug where we orphan a lock that's still held. This doesn't happen normally - makes me wonder if it can happen when the leader changes? I'll attach the log of the metadata quorum nodes from my local test. However, I bet this could be reproduced with some adequate load.
> Can you take a look into this, Vlad?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)