You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ivan Veselovsky (JIRA)" <ji...@apache.org> on 2016/05/23 11:04:12 UTC

[jira] [Commented] (IGNITE-3177) [Test] IgfsSizeSelfTest.testReplicated sometimes fails with a timeout.

    [ https://issues.apache.org/jira/browse/IGNITE-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296230#comment-15296230 ] 

Ivan Veselovsky commented on IGNITE-3177:
-----------------------------------------

The following 2 lines of stderr of build #4842 are of interest:
{code}
[13:35:08,617][ERROR][main][root] Test has been timed out and will be interrupted (threads dump will be taken before interruption) [test=testReplicated, timeout=300000]
[13:38:33,207][WARN ][main][IgfsSizeSelfTest0] Dumping debug info for node [id=7532609d-88b2-4179-a425-71ae10a00000, name=igfs.IgfsSizeSelfTest0, order=1, topVer=2, client=false]
{code}

These operations happen on one thread, and there are no blocking operations between them (see code below), but the 2nd line logged after 3.5 minutes after the 1st. 
{code}
        if (runner.isAlive()) {
            U.error(log,
                "Test has been timed out and will be interrupted (threads dump will be taken before interruption) [" +
                "test=" + getName() + ", timeout=" + getTestTimeout() + ']');

            List<Ignite> nodes = IgnitionEx.allGridsx();

            for (Ignite node : nodes)
                ((IgniteKernal)node).dumpDebugInfo();
{code}

IgnitionEx.allGridsx() is designed to be non-blocking , it gets Ignite nodes with 'wait' parameter being 'false'.

So, one of possible explanations of the hang up is that some operating system conditions cause the test java process to execute extremely slowly (insufficient memory in the system, for example).

> [Test] IgfsSizeSelfTest.testReplicated sometimes fails with a timeout. 
> -----------------------------------------------------------------------
>
>                 Key: IGNITE-3177
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3177
>             Project: Ignite
>          Issue Type: Test
>          Components: IGFS
>            Reporter: Ivan Veselovsky
>            Assignee: Ivan Veselovsky
>
> In some rare cases test IgfsSizeSelfTest.testReplicated fails with a timeout.
> We have 4 such logs.
> In all known cases a hang up happens on attempt to start the 3rd node.
> I was not able to reproduce the problem locally (test ran ~2200 times without any errors).
> In 1 of 4 cases this is a timeout happening in method  org.apache.ignite.testframework.junits.common.GridCommonAbstractTest#awaitPartitionMapExchange(boolean, boolean) .
> In other 3 cases this is some hang up happening upon the 3rd node start up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)