You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Andrey Mashenkov (Jira)" <ji...@apache.org> on 2022/10/24 16:47:00 UTC

[jira] [Updated] (IGNITE-17964) Potential deadlock in discovery thread while updating SQL statistics.

     [ https://issues.apache.org/jira/browse/IGNITE-17964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrey Mashenkov updated IGNITE-17964:
--------------------------------------
    Description: 
On node start/activation IgniteStatisticsConfigurationManager initializes and tries to cleanup orphaned records (e.g. for tables, which were dropped before node stop/crash).
To do that *stat-mgmt* thread updates distributed metastorage synchronously under the read-lock.
Underneath, metastorage sends a request via discovery, then 
discovery component gets the answer on that message, and gets stuck trying to get the write-lock to complete the future... 
So, *stat-mgmt* and *disco-notify* threads fall into inevitable deadlock.

We should avoid any synchronous operation on distributed metastorage under the read-lock.

Let’s rewrite synchronous CAS deep inside the closure (see IgniteStatisticsConfigurationManager.updateLocalStatistics) to async CAS and pull it's future up to outside the closure and the read-lock.


  was:
On node start/activation IgniteStatisticsConfigurationManager initializes and tries to cleanup orphaned records (e.g. for tables, which were dropped before node stop/crash).
To do that *stat-mgmt* thread updates distributed metastorage synchronously under the read-lock.
Underneath, metastorage sends a request via discovery, then 
discovery component gets the answer on that message, and gets stuck trying to get the write-lock to complete the future... 
So, *stat-mgmt* and *disco-notify* thread fall into inevitable deadlock.

We should avoid any synchronous operation on distributed metastorage under the read-lock.

Let’s rewrite synchronous CAS deep inside the closure (see IgniteStatisticsConfigurationManager.updateLocalStatistics) to async CAS and pull it's future up to outside the closure and the read-lock.



> Potential deadlock in discovery thread while updating SQL statistics.
> ---------------------------------------------------------------------
>
>                 Key: IGNITE-17964
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17964
>             Project: Ignite
>          Issue Type: Bug
>          Components: sql
>            Reporter: Andrey Mashenkov
>            Assignee: Andrey Mashenkov
>            Priority: Major
>
> On node start/activation IgniteStatisticsConfigurationManager initializes and tries to cleanup orphaned records (e.g. for tables, which were dropped before node stop/crash).
> To do that *stat-mgmt* thread updates distributed metastorage synchronously under the read-lock.
> Underneath, metastorage sends a request via discovery, then 
> discovery component gets the answer on that message, and gets stuck trying to get the write-lock to complete the future... 
> So, *stat-mgmt* and *disco-notify* threads fall into inevitable deadlock.
> We should avoid any synchronous operation on distributed metastorage under the read-lock.
> Let’s rewrite synchronous CAS deep inside the closure (see IgniteStatisticsConfigurationManager.updateLocalStatistics) to async CAS and pull it's future up to outside the closure and the read-lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)