You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Chris K Wensel (JIRA)" <ji...@apache.org> on 2008/09/18 17:42:44 UTC

[jira] Created: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
-----------------------------------------------------------------------

                 Key: HADOOP-4202
                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
             Project: Hadoop Core
          Issue Type: Bug
          Components: test
    Affects Versions: 0.18.0
            Reporter: Chris K Wensel
            Priority: Minor


If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645922#action_12645922 ] 

Chris K Wensel commented on HADOOP-4202:
----------------------------------------

The slow down is within ant, not an IDE. 

And yes to compensate, I have reorganized my tests so too many daemon threads don't get stuck in a particular jvm. 

And yes, for now, I want a pristine cluster for every relevant test method, not just for every TestCase, and I don't want an explosion of TestCase classes just to provide a test on a variation of a particular theme.

since Cascading allows for very short/concise expression of  a data processing usecase, and is easily parameterized, having lots of tests with minor variations does not render a huge unwieldy TestCase class. (though one in particular did recently cross that line, and has been recently beaten back)

And yes, I am accepting patches that clean up my crufty tests. But I would be happy with threads that listen and die when told to.

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.1
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Minor
>         Attachments: hadoop-4202.patch
>
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris K Wensel updated HADOOP-4202:
-----------------------------------

    Priority: Major  (was: Minor)

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.0
>            Reporter: Chris K Wensel
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632664#action_12632664 ] 

Steve Loughran commented on HADOOP-4202:
----------------------------------------

looking at the code in HADOOP-3628 to shut down the threads in more detail, there is a fair amount of work at the end of TaskTracker to do this

- a TaskCleanupThread class is defined that can be terminated by calling its terminate() method; this flips a volatile flag and interrupts the loop.

- the CleanupQueue has a special EndQueue class that can be pushed into the cleanup queue; when this is hit the task ends. TestCleanupQueue verifies that is all works.

I'd advocate not worrying about this for versions <0.20 as it only arises in testing when you are running lots of test cases in the same VM, perhaps under an IDE. It should not surface in production where the process gets terminated. When the 3628 lifecycle work does get merged in, this is included as part of the feature set. 

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.0
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4202:
----------------------------------

    Status: Open  (was: Patch Available)

This seems like the wrong fix for two reasons.

# Changing the worker threads to poll at an arbitrary duration solely to accommodate the unit tests is the wrong tradeoff.
# The default junit policy (also explicitly specified in build.xml) spawns a new JVM for each TestCase, so it's unclear to me why this should even be an issue. If a TestCase is creating a new MiniMRCluster for each test, then- in almost all cases- it is using it incorrectly and the TestCase should be rewritten.

Have I misunderstood the problem? Is this a regression? Is it still an issue? Hudson doesn't seem to be running any slower...

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.1
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Minor
>         Attachments: hadoop-4202.patch
>
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris K Wensel reassigned HADOOP-4202:
--------------------------------------

    Assignee: Chris K Wensel

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.0
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris K Wensel updated HADOOP-4202:
-----------------------------------

    Attachment: hadoop-4202.patch

Have updated the daemon threads to check the 'running' state, and if not running, will silently exit. to do this, a couple threads are now calling poll() instead of take(), with a timeout of 10 seconds. this value is arbitrary. 

I haven't added a test case to guaranteed the threads are gone as this patch is so MiniMRCluster behaves a little better. and it looks like lifecycle interfaces will be added in a future patch. please advise otherwise.

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.0
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>         Attachments: hadoop-4202.patch
>
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645925#action_12645925 ] 

Chris Douglas commented on HADOOP-4202:
---------------------------------------

Oh, I see. Would the following idiom work for your tests? It's used in a few TestCase implementations (TestReduceFetch, TestDatamerge, etc.) that require a cluster and run several variations:

{code}
  private static MiniMRCluster mrCluster = null;
  private static MiniDFSCluster dfsCluster = null;
  public static Test suite() {
    TestSetup setup = new TestSetup(new TestSuite(TestReduceFetch.class)) {
      protected void setUp() throws Exception {
        Configuration conf = new Configuration();
        dfsCluster = new MiniDFSCluster(conf, 2, true, null);
        mrCluster = new MiniMRCluster(2,
            dfsCluster.getFileSystem().getUri().toString(), 1);
      }
      protected void tearDown() throws Exception {
        if (dfsCluster != null) { dfsCluster.shutdown(); }
        if (mrCluster != null) { mrCluster.shutdown(); }
      }
    };
    return setup;
  }   
{code}

The {{test\*}} methods all use the same, static cluster. Unless you're testing sets of parameters required during the TT init, it should be sufficient.

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.1
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Minor
>         Attachments: hadoop-4202.patch
>
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645927#action_12645927 ] 

Chris K Wensel commented on HADOOP-4202:
----------------------------------------

Thanks for the code snippet. 

I'm unsure how this changes things unless its some fu to get around ant/junit spawn semantics. Things are bearable now, time permitting I'll give this a deeper look. 

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.1
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Minor
>         Attachments: hadoop-4202.patch
>
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632728#action_12632728 ] 

Chris K Wensel commented on HADOOP-4202:
----------------------------------------

I'm still seeing some MiniXXXCluster goofyness. I'll try HADOOP-3628 and see how it fares. Thanks for the comments Steve.

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.1
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>         Attachments: hadoop-4202.patch
>
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas resolved HADOOP-4202.
-----------------------------------

    Resolution: Later

I'll close this for now. If it continues to be a problem and can't wait for HADOOP-3628, it can be reopened.

In what environment is this causing problems? Per (2), I'm having trouble imagining how a correctly written TestCase would run into this.

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.1
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Minor
>         Attachments: hadoop-4202.patch
>
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645932#action_12645932 ] 

Chris Douglas commented on HADOOP-4202:
---------------------------------------

bq. I'm unsure how this changes things unless its some fu to get around ant/junit spawn semantics.

This starts a MiniMRCluster and a MiniDFSCluster for the TestCase, not for each test\* run. It's sped up several of our tests. Depending on how literally you mean "pristine", TestCase::setUp and TestCase::tearDown might help you clean up any state left from previous tests, too.

Sorry, I thought these were Hadoop tests that were taking longer. My comments about "correctness" were scoped to our project and its requirements, only.

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.1
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Minor
>         Attachments: hadoop-4202.patch
>
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632255#action_12632255 ] 

Chris K Wensel commented on HADOOP-4202:
----------------------------------------

I think HADOOP-3546 is responsible for the change.

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.0
>            Reporter: Chris K Wensel
>            Priority: Minor
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632646#action_12632646 ] 

Steve Loughran commented on HADOOP-4202:
----------------------------------------

In the latest patch for HADOOP-3628 there's a stopCleanupThreads() method that does this; it gets invoked when the TaskTracker gets terminated.

All the services lack tests to ensure that they shut down reliably in-process; the assumption in production is that the hosting JVM is killed. If we really want to have them fully cleaned up when terminated, we need tests to verify that threads are gone, and that http daemons aren't listening on the designated ports. 

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.0
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645707#action_12645707 ] 

Chris K Wensel commented on HADOOP-4202:
----------------------------------------

Patch right or wrong, threads should clean up. Looks like this is being addressed elsewhere (see above). Happy to see this issue closed.

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.1
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Minor
>         Attachments: hadoop-4202.patch
>
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris K Wensel updated HADOOP-4202:
-----------------------------------

    Priority: Minor  (was: Major)

Changed to minor.

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.1
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>            Priority: Minor
>         Attachments: hadoop-4202.patch
>
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4202) TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris K Wensel updated HADOOP-4202:
-----------------------------------

    Affects Version/s:     (was: 0.18.0)
                       0.18.1
               Status: Patch Available  (was: Open)

> TaskTracker never stops cleanup threads, MiniMRCluster becomes unstable
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4202
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.18.1
>            Reporter: Chris K Wensel
>            Assignee: Chris K Wensel
>         Attachments: hadoop-4202.patch
>
>
> If many unit tests start/stop unique MiniMRCluster instances, over time the number of threads in the test vm grow to large causing tests to hang and/or slow down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.