You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Thomas Tauber-Marshall (JIRA)" <ji...@apache.org> on 2017/08/01 15:10:01 UTC

[jira] [Created] (IMPALA-5749) Race in coordinator hits DCHECK on 'num_remaining_backends_ > 0'

Thomas Tauber-Marshall created IMPALA-5749:
----------------------------------------------

             Summary: Race in coordinator hits DCHECK on 'num_remaining_backends_ > 0'
                 Key: IMPALA-5749
                 URL: https://issues.apache.org/jira/browse/IMPALA-5749
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.10.0
            Reporter: Thomas Tauber-Marshall
            Priority: Blocker


Discovered while running 'test_finst_cancel_when_query_complete' in a loop trying to repro a different issue, there's a race in Coordinator::UpdateBackendExecStatus that causes Impala to crash on the 'DCHECK_GT(num_remaining_backends_, 0)'

The problem is that only the first exec report returned for a particular backend after it has completed is supposed to hit line 992, where we decrease 'num_remaining_backends_'. Per the comments, this is supposed to be ensured by the BackendState::IsDone check on line 945.

However, the check and the update aren't performed atomically, so you can have a situation where two threads enter UpdateBackendExecStatus at the same time, both check BackendState::IsDone and find it false, and then both proceed to update num_remaining_backends_, with the second one hitting the DCHECK.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)