You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2007/12/09 09:08:43 UTC

[jira] Created: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

[hbase] TestRegionServerExit has new failure mode since HADOOP-2338
-------------------------------------------------------------------

                 Key: HADOOP-2392
                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
             Project: Hadoop
          Issue Type: Bug
          Components: contrib/hbase
    Affects Versions: 0.16.0
            Reporter: Jim Kellerman
            Assignee: Jim Kellerman
             Fix For: 0.16.0


TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550144 ] 

jimk edited comment on HADOOP-2392 at 12/10/07 11:18 AM:
------------------------------------------------------------------

Failed in nightly build #328

This was a strange failure. The region server serving the meta region aborted as expected, its leas timed out, the master started processing the shutdown by splitting the region server's log and scanning the root region where it discovered that the downed server was serving the meta region, the meta region was reassigned, opened, information was updated in the root region, yet the next root scan got back the old data:

{code}

    [junit] 2007-12-10 11:58:52,417 INFO  [HMaster] org.apache.hadoop.hbase.HMaster$ProcessRegionOpen.process(HMaster.java:2461): updating row .META.,,1 in table -ROOT-,,0 with startcode 1197287919218 and server 140.211.11.75:39266

    [junit] 2007-12-10 11:59:00,221 INFO  [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:213): HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:39206}
    [junit] 2007-12-10 11:59:00,317 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:249): HMaster.rootScanner regioninfo: {regionname: .META.,,1, startKey: <>, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 140.211.11.75:39205, startCode: 1197287908915
    [junit] 2007-12-10 11:59:00,317 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.checkAssigned(HMaster.java:469): Current assignment of .META.,,1 is no good
{code}

As to why the test timed out, one of the region servers failed to exit completely (HRegionServer.run)

      was (Author: jimk):
    Failed in nightly build #328
  
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2392:
----------------------------------

    Status: Patch Available  (was: Open)

> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549822 ] 

Hadoop QA commented on HADOOP-2392:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12371307/patch.txt
against trunk revision r602633.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1305/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1305/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1305/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1305/console

This message is automatically generated.

> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2392:
----------------------------------

    Attachment: patch.txt

> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2392:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Tests passed. Comiitted

> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550033 ] 

Hudson commented on HADOOP-2392:
--------------------------------

Integrated in Hadoop-Nightly #328 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/328/])

> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2392:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Tests passed. Committed. If the problem should reoccur, we will open a new Jira.

> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt, patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550761 ] 

Hadoop QA commented on HADOOP-2392:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12371466/patch.txt
against trunk revision r603315.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1321/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1321/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1321/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1321/console

This message is automatically generated.

> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt, patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2392:
----------------------------------

    Status: Patch Available  (was: Reopened)

Works for me. Try hudson

> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt, patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reopened HADOOP-2392:
-----------------------------------


Failed in nightly build #328

> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2392:
----------------------------------

    Attachment: patch.txt

HADOOP-2392, HADOOP-2324:

Chore
- initialChore() now returns boolean

HMaster
- rather than retry in root and meta scanners, return if a scan fails. It will get retried on the next scan. This has two effects: 1) scanners exit more quickly during shutdown and 2) they don't keep retrying to connect to a dead server, allowing them to recover from a server going down more quickly.
- initialScan in root and meta scanners return boolean and do not progress to maintenanceScan until the initial scan completes successfully.

HRegionServer
- speed up region server exit by reordering join's so that we join with threads in the order that we told them to stop

TestTableMapReduce
- remove overrides of heartbeat and thread wake intervals

HADOOP-2396:

HMaster
- move check for null HRegionInfo before first attempt to dereference it.

HADOOP-2397:
- HMaster$BaseScanner.checkAssigned: don't try to split dead server's log if initial startup has completed.

HADOOP-2353:

HMsg
- change toString() to only output the region name rather than calling HRegionInfo.toString()

StaticTestEnvironment
- make logging a bit less verbose

TestHLog
- was writing to local file system and failing on Windows

> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt, patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549919 ] 

Jim Kellerman commented on HADOOP-2392:
---------------------------------------

HMaster: stop root and meta scanners after region servers are quiesced - this should fix TestRegionServerExit.
HStore, TestTableMapReduce: reduce logging verbosity
TestTableJoinMapReduce - make shutdown more like other hbase test cases.

> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2392) [hbase] TestRegionServerExit has new failure mode since HADOOP-2338

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550940 ] 

Hudson commented on HADOOP-2392:
--------------------------------

Integrated in Hadoop-Nightly #330 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/330/])

> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
>                 Key: HADOOP-2392
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2392
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: patch.txt, patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that  a scanner lease is not being cleaned up correctly?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.