You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Pooja Nilangekar (JIRA)" <ji...@apache.org> on 2019/01/08 05:05:00 UTC
[jira] [Commented] (IMPALA-8007) test_slow_subscriber is flaky

    [ https://issues.apache.org/jira/browse/IMPALA-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736730#comment-16736730 ] 

Pooja Nilangekar commented on IMPALA-8007:
------------------------------------------

I was looking for reasons for the secs_since_heartbeat to be lower than the sleep time and I found the following in the python docs:
{code:java}
The actual suspension time may be less than that requested because any caught signal will terminate the sleep() following execution of that signal’s catching routine. Also, the suspension time may be longer than requested by an arbitrary amount because of the scheduling of other activity in the system.

Changed in version 3.5: The function now sleeps at least secs even if the sleep is interrupted by a signal, except if the signal handler raises an exception (see PEP 475 for the rationale).
{code}
As per [~tarmstrong]'s suggestion I modified the test to check that the secs_since_heartbeats is always greater than the previous time. This test also fails because the statestore's web UI provides the duration in millisecond precision while the sleep function can wake up in less than one millisecond. So I there are two options here
 # Check for a monotonically increasing duration instead of a strictly increasing duration. This would help solve the issue of the test but I am not sure that it'd actually validate the correctness of the monitoring thread. (A thread could return the exact same value for several seconds/minutes and that would still be accepted by the thread).
 # Update the version of the time library used to 3.5 or higher and then check for strictly increasing duration since heartbeat. This might affect other instances where "time" is imported but would actually validate the heartbeat monitoring thread. 

Tim, what do you suggest? 

> test_slow_subscriber is flaky
> -----------------------------
>
>                 Key: IMPALA-8007
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8007
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: bharath v
>            Assignee: Pooja Nilangekar
>            Priority: Major
>              Labels: broken-build, flaky
>             Fix For: Impala 3.2.0
>
>
> We have hit both the asserts in the test.
> *Exhaustive:*
> {noformat}
> statestore/test_statestore.py:574: in test_slow_subscriber     assert (secs_since_heartbeat < float(sleep_time + 1.0)) E   assert 8.8040000000000003 < 6.0 E    +  where 6.0 = float((5 + 1.0))
> Stacktrace
> statestore/test_statestore.py:574: in test_slow_subscriber
>     assert (secs_since_heartbeat < float(sleep_time + 1.0))
> E   assert 8.8040000000000003 < 6.0
> E    +  where 6.0 = float((5 + 1.0))
> {noformat}
> *ASAN*
> {noformat}
> Error Message
> statestore/test_statestore.py:573: in t     assert (secs_since_heartbeat > float(sleep_time - 1.0)) E   assert 4.995 > 5.0 E    +  where 5.0 = float((6 - 1.0))
> Stacktrace
> statestore/test_statestore.py:573: in test_slow_subscriber
>     assert (secs_since_heartbeat > float(sleep_time - 1.0))
> E   assert 4.995 > 5.0
> E    +  where 5.0 = float((6 - 1.0))
> {noformat}
> I only noticed this happen twice (the above two instances) since the patch is committed. So, looks like a racy bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org