You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "stack (JIRA)" <ji...@apache.org> on 2007/10/09 21:52:51 UTC
[jira] Created: (HADOOP-2017) [hbase] TestRegionServerAbort failure
in patch build #903 and nightly #266
[hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
--------------------------------------------------------------------------
Key: HADOOP-2017
URL: https://issues.apache.org/jira/browse/HADOOP-2017
Project: Hadoop
Issue Type: Bug
Components: contrib/hbase
Reporter: stack
Priority: Minor
In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2017) [hbase] TestRegionServerAbort
failure in patch build #903 and nightly #266
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack reassigned HADOOP-2017:
-----------------------------
Assignee: stack
> [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
> --------------------------------------------------------------------------
>
> Key: HADOOP-2017
> URL: https://issues.apache.org/jira/browse/HADOOP-2017
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Priority: Minor
> Attachments: trsa.patch
>
>
> In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
> In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2017) [hbase] TestRegionServerAbort failure
in patch build #903 and nightly #266
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HADOOP-2017:
--------------------------
Attachment: trsa.patch
A patch w/ more logging and thread dumping to better help what is going on, and a mechanism that notices moved regions sooner.
{code}
HADOOP-2017 TestRegionServerAbort failure in patch build #903 and nightly #266
Notice moved META regions sooner. Also added more logging and
thread dumping once a minute when test starts to take too long
so can see where we are hung (if we are hung).
M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestHStoreFile.java
Inherit from HBaseTestCase.
M src/contrib/hbase/src/test/org/apache/hadoop/hbase/HBaseClusterTestCase.java
(threadDumpingJoin): Added.
M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestRegionServerAbort.java
Run verification in its own thread so can concurrently thread dump if
test is going on too long.
M src/contrib/hbase/src/test/org/apache/hadoop/hbase/DFSAbort.java
Moved join up into parent class.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/Chore.java
Remove unused import.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java
(MetaRegion.toString): Added.
Added logging around assignment checking and log split.
(MetaRegion.compareTo): Add consideration of server address.
(numberOfMetaRegions, metaRegionsToScan, onlineMetaRegions):
Put declaration and assignment together and made final.
(scanOneMetaRegion): If the region is no longer in onlineMetaRegions,
give up trying to scan.
(unassignRootRegion): Added (Not yet finished).
{code}
> [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
> --------------------------------------------------------------------------
>
> Key: HADOOP-2017
> URL: https://issues.apache.org/jira/browse/HADOOP-2017
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Priority: Minor
> Fix For: 0.15.0
>
> Attachments: trsa.patch
>
>
> In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
> In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2017) [hbase] TestRegionServerAbort failure
in patch build #903 and nightly #266
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HADOOP-2017:
--------------------------
Fix Version/s: 0.15.0
Status: Patch Available (was: Open)
Builds locally.
> [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
> --------------------------------------------------------------------------
>
> Key: HADOOP-2017
> URL: https://issues.apache.org/jira/browse/HADOOP-2017
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Priority: Minor
> Fix For: 0.15.0
>
> Attachments: trsa.patch
>
>
> In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
> In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2017) [hbase] TestRegionServerAbort failure
in patch build #903 and nightly #266
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HADOOP-2017:
--------------------------
Resolution: Fixed
Fix Version/s: 0.15.0
Status: Resolved (was: Patch Available)
Hasn't recurred since commit. HADOOP-2038 should also makes this issue less likely. Also, this test has been moved into TestRegionServerExit. Resolving as fixed.
> [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
> --------------------------------------------------------------------------
>
> Key: HADOOP-2017
> URL: https://issues.apache.org/jira/browse/HADOOP-2017
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Priority: Minor
> Fix For: 0.15.0
>
> Attachments: trsa.patch
>
>
> In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
> In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2017) [hbase] TestRegionServerAbort
failure in patch build #903 and nightly #266
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533478 ]
stack commented on HADOOP-2017:
-------------------------------
Nightly #263 also failed on TRSA in same manner as patch build #903
> [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
> --------------------------------------------------------------------------
>
> Key: HADOOP-2017
> URL: https://issues.apache.org/jira/browse/HADOOP-2017
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Priority: Minor
>
> In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
> In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2017) [hbase] TestRegionServerAbort
failure in patch build #903 and nightly #266
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533722 ]
Hudson commented on HADOOP-2017:
--------------------------------
Integrated in Hadoop-Nightly #267 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/267/])
> [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
> --------------------------------------------------------------------------
>
> Key: HADOOP-2017
> URL: https://issues.apache.org/jira/browse/HADOOP-2017
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Priority: Minor
> Attachments: trsa.patch
>
>
> In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
> In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2017) [hbase] TestRegionServerAbort
failure in patch build #903 and nightly #266
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533531 ]
stack commented on HADOOP-2017:
-------------------------------
Applied patch. Now waiting to see if problem occurs again. If so, extra logging and thread dumps should help.
> [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
> --------------------------------------------------------------------------
>
> Key: HADOOP-2017
> URL: https://issues.apache.org/jira/browse/HADOOP-2017
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Priority: Minor
> Attachments: trsa.patch
>
>
> In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
> In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2017) [hbase] TestRegionServerAbort failure
in patch build #903 and nightly #266
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HADOOP-2017:
--------------------------
Fix Version/s: (was: 0.15.0)
> [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
> --------------------------------------------------------------------------
>
> Key: HADOOP-2017
> URL: https://issues.apache.org/jira/browse/HADOOP-2017
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Priority: Minor
> Attachments: trsa.patch
>
>
> In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
> In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2017) [hbase] TestRegionServerAbort
failure in patch build #903 and nightly #266
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533514 ]
Hadoop QA commented on HADOOP-2017:
-----------------------------------
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12367392/trsa.patch
against trunk revision r583037.
@author +1. The patch does not contain any @author tags.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new compiler warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests +1. The patch passed contrib unit tests.
Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/910/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/910/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/910/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/910/console
This message is automatically generated.
> [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
> --------------------------------------------------------------------------
>
> Key: HADOOP-2017
> URL: https://issues.apache.org/jira/browse/HADOOP-2017
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Priority: Minor
> Fix For: 0.15.0
>
> Attachments: trsa.patch
>
>
> In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
> In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.