You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2007/12/09 09:08:43 UTC
[jira] Created: (HADOOP-2392) [hbase] TestRegionServerExit has new
failure mode since HADOOP-2338
[hbase] TestRegionServerExit has new failure mode since HADOOP-2338
-------------------------------------------------------------------
Key: HADOOP-2392
URL: https://issues.apache.org/jira/browse/HADOOP-2392
Project: Hadoop
Issue Type: Bug
Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
Fix For: 0.16.0
TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-2392) [hbase]
TestRegionServerExit has new failure mode since HADOOP-2338
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550144 ]
jimk edited comment on HADOOP-2392 at 12/10/07 11:18 AM:
------------------------------------------------------------------
Failed in nightly build #328
This was a strange failure. The region server serving the meta region aborted as expected, its leas timed out, the master started processing the shutdown by splitting the region server's log and scanning the root region where it discovered that the downed server was serving the meta region, the meta region was reassigned, opened, information was updated in the root region, yet the next root scan got back the old data:
{code}
[junit] 2007-12-10 11:58:52,417 INFO [HMaster] org.apache.hadoop.hbase.HMaster$ProcessRegionOpen.process(HMaster.java:2461): updating row .META.,,1 in table -ROOT-,,0 with startcode 1197287919218 and server 140.211.11.75:39266
[junit] 2007-12-10 11:59:00,221 INFO [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:213): HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:39206}
[junit] 2007-12-10 11:59:00,317 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:249): HMaster.rootScanner regioninfo: {regionname: .META.,,1, startKey: <>, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 140.211.11.75:39205, startCode: 1197287908915
[junit] 2007-12-10 11:59:00,317 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.checkAssigned(HMaster.java:469): Current assignment of .META.,,1 is no good
{code}
As to why the test timed out, one of the region servers failed to exit completely (HRegionServer.run)
was (Author: jimk):
Failed in nightly build #328
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2392) [hbase] TestRegionServerExit has new
failure mode since HADOOP-2338
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2392:
----------------------------------
Status: Patch Available (was: Open)
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2392) [hbase] TestRegionServerExit has
new failure mode since HADOOP-2338
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549822 ]
Hadoop QA commented on HADOOP-2392:
-----------------------------------
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371307/patch.txt
against trunk revision r602633.
@author +1. The patch does not contain any @author tags.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new compiler warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests +1. The patch passed contrib unit tests.
Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1305/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1305/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1305/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1305/console
This message is automatically generated.
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2392) [hbase] TestRegionServerExit has new
failure mode since HADOOP-2338
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2392:
----------------------------------
Attachment: patch.txt
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2392) [hbase] TestRegionServerExit has new
failure mode since HADOOP-2338
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2392:
----------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
Tests passed. Comiitted
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2392) [hbase] TestRegionServerExit has
new failure mode since HADOOP-2338
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550033 ]
Hudson commented on HADOOP-2392:
--------------------------------
Integrated in Hadoop-Nightly #328 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/328/])
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2392) [hbase] TestRegionServerExit has new
failure mode since HADOOP-2338
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2392:
----------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
Tests passed. Committed. If the problem should reoccur, we will open a new Jira.
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt, patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2392) [hbase] TestRegionServerExit has
new failure mode since HADOOP-2338
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550761 ]
Hadoop QA commented on HADOOP-2392:
-----------------------------------
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371466/patch.txt
against trunk revision r603315.
@author +1. The patch does not contain any @author tags.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new compiler warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests +1. The patch passed contrib unit tests.
Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1321/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1321/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1321/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1321/console
This message is automatically generated.
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt, patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2392) [hbase] TestRegionServerExit has new
failure mode since HADOOP-2338
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2392:
----------------------------------
Status: Patch Available (was: Reopened)
Works for me. Try hudson
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt, patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Reopened: (HADOOP-2392) [hbase] TestRegionServerExit has new
failure mode since HADOOP-2338
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman reopened HADOOP-2392:
-----------------------------------
Failed in nightly build #328
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2392) [hbase] TestRegionServerExit has new
failure mode since HADOOP-2338
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2392:
----------------------------------
Attachment: patch.txt
HADOOP-2392, HADOOP-2324:
Chore
- initialChore() now returns boolean
HMaster
- rather than retry in root and meta scanners, return if a scan fails. It will get retried on the next scan. This has two effects: 1) scanners exit more quickly during shutdown and 2) they don't keep retrying to connect to a dead server, allowing them to recover from a server going down more quickly.
- initialScan in root and meta scanners return boolean and do not progress to maintenanceScan until the initial scan completes successfully.
HRegionServer
- speed up region server exit by reordering join's so that we join with threads in the order that we told them to stop
TestTableMapReduce
- remove overrides of heartbeat and thread wake intervals
HADOOP-2396:
HMaster
- move check for null HRegionInfo before first attempt to dereference it.
HADOOP-2397:
- HMaster$BaseScanner.checkAssigned: don't try to split dead server's log if initial startup has completed.
HADOOP-2353:
HMsg
- change toString() to only output the region name rather than calling HRegionInfo.toString()
StaticTestEnvironment
- make logging a bit less verbose
TestHLog
- was writing to local file system and failing on Windows
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt, patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2392) [hbase] TestRegionServerExit has
new failure mode since HADOOP-2338
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549919 ]
Jim Kellerman commented on HADOOP-2392:
---------------------------------------
HMaster: stop root and meta scanners after region servers are quiesced - this should fix TestRegionServerExit.
HStore, TestTableMapReduce: reduce logging verbosity
TestTableJoinMapReduce - make shutdown more like other hbase test cases.
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2392) [hbase] TestRegionServerExit has
new failure mode since HADOOP-2338
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550940 ]
Hudson commented on HADOOP-2392:
--------------------------------
Integrated in Hadoop-Nightly #330 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/330/])
> [hbase] TestRegionServerExit has new failure mode since HADOOP-2338
> -------------------------------------------------------------------
>
> Key: HADOOP-2392
> URL: https://issues.apache.org/jira/browse/HADOOP-2392
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Jim Kellerman
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: patch.txt, patch.txt
>
>
> TestRegionServerExit has a new failure mode since HADOOP-2338. It appears that the region server won't exit. Is it possible that a scanner lease is not being cleaned up correctly?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.