You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Dan Hecht (JIRA)" <ji...@apache.org> on 2018/06/21 15:39:00 UTC
[jira] [Resolved] (IMPALA-7046) Add targeted regression test for race in IMPALA-7033

     [ https://issues.apache.org/jira/browse/IMPALA-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Hecht resolved IMPALA-7046.
-------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.1.0

commit 11aaa6caa0818a30db662a4b6f07147faf1f52b2
Author: Dan Hecht <dh...@cloudera.com>
Date:   Mon Jun 11 16:30:06 2018 -0700

    IMPALA-7046: introduce "global" debug_actions

    The motivation is to add jitter to backend startup in test_failpoints.
    The race in IMPALA-7033 can be reproduced by adding jitter to the exec
    rpcs when some backends fail. Let's add jitter to test_failpoints to get
    better coverage of exec startup races.

    This builds on top of the debug action extensions added in the async
    admission control patch by allowing the new "global" debug actions
    (i.e. actions that can be used in points outside of the ExecNodes).
    See the code comments for details.

    For now, we're only using the SLEEP and JITTER commands, but I've
    included a FAIL command as well since I'll want to use that to write a
    test for IMPALA-6788 to simulate exec rpc failure.

    Note that I don't bother resolving the actions ahead of time (like we do
    for ExecNode actions). It doesn't seem worth it since the resolution
    only needs to occur after we've matched the label and I don't expect the
    same label to be hit many times within a single thread. We can always
    optimize this later if needed.

    Testing:
    - Verified that test_failpoints can reproduce the race in
      IMPALA-7033 by reverting that fix and testing.
    - Ran the modified tests and grepped the impalad log to see
      that the sleeps are still occuring.
    - Manually verify global FAIL command (in a build with another patch).
    - Manually verified invalid debug_actions (both ExecNode and global)

    Change-Id: I77663a539be18711a4f12c470ffd7474e3d69388
    Reviewed-on: http://gerrit.cloudera.org:8080/10690
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>

> Add targeted regression test for race in IMPALA-7033
> ----------------------------------------------------
>
>                 Key: IMPALA-7046
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7046
>             Project: IMPALA
>          Issue Type: Task
>          Components: Backend
>    Affects Versions: Impala 3.1.0
>            Reporter: Dan Hecht
>            Assignee: Dan Hecht
>            Priority: Major
>             Fix For: Impala 3.1.0
>
>
> I'd like to add a regression test to trigger the race in IMPALA-7033 more reliably, but it will involve doing some sleeps at specific places, so I'd like to add it after [~bikramjeet.vig] commits a change that provides some infrastructure for that.
> The race was:
> 1) Coordinator::Exec() takes the QueryState ExecResources reference count.
> 2) Coordinator sends out exec rpc to non-coordinator backend.
> 3) Some non-coordinator backend sends a failure report which invokes HandleExecStateTransition, which drops the coordinator's reference to the exec resources.
> 4) Coordinator sends out exec rpc to coordinator backend, which takes the exec resources reference and releases it. We don't expect the reference count to become non-zero after it has already gone through a cycle.
> The fix for this race is included in [https://gerrit.cloudera.org/#/c/10440]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org