You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2013/08/15 00:21:48 UTC

[jira] [Comment Edited] (HBASE-9217) TestReplicationSmallTests#testDisableEnable fails intermittently

    [ https://issues.apache.org/jira/browse/HBASE-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740291#comment-13740291 ] 

Jean-Daniel Cryans edited comment on HBASE-9217 at 8/14/13 10:21 PM:
---------------------------------------------------------------------

I think it's actually just bad timing. The region servers are about to sleep for 10 seconds before re-checking for replication:

{noformat}
2013-08-14 08:06:53,189 DEBUG [RS:0;vesta:43149-EventThread.replicationSource,2] regionserver.ReplicationSource(578): Replication is disabled, sleeping 100 times 10
2013-08-14 08:06:53,302 DEBUG [RS:1;vesta:39079-EventThread.replicationSource,2] regionserver.ReplicationSource(578): Replication is disabled, sleeping 100 times 10
{noformat}

But we only wait 5 seconds for the row:

{noformat}
2013-08-14 08:06:53,390 INFO  [Thread-2823] replication.ReplicationPeersZKImpl(123): peer 2 is enabled
2013-08-14 08:06:53,392 INFO  [Thread-2823] replication.TestReplicationSmallTests(306): Row not available
...
2013-08-14 08:06:57,902 INFO  [Thread-2823] replication.TestReplicationSmallTests(306): Row not available
{noformat}

So I'm surprised it doesn't happen more often!
                
      was (Author: jdcryans):
    I think it's actually just bad timing. The region servers are about to sleep for 10 seconds before re-checking for replication:

{norformat}
2013-08-14 08:06:53,189 DEBUG [RS:0;vesta:43149-EventThread.replicationSource,2] regionserver.ReplicationSource(578): Replication is disabled, sleeping 100 times 10
2013-08-14 08:06:53,302 DEBUG [RS:1;vesta:39079-EventThread.replicationSource,2] regionserver.ReplicationSource(578): Replication is disabled, sleeping 100 times 10
{noformat}

But we only wait 5 seconds for the row:

{noformat}
2013-08-14 08:06:53,390 INFO  [Thread-2823] replication.ReplicationPeersZKImpl(123): peer 2 is enabled
2013-08-14 08:06:53,392 INFO  [Thread-2823] replication.TestReplicationSmallTests(306): Row not available
...
2013-08-14 08:06:57,902 INFO  [Thread-2823] replication.TestReplicationSmallTests(306): Row not available
{noformat}

So I'm surprised it doesn't happen more often!
                  
> TestReplicationSmallTests#testDisableEnable fails intermittently
> ----------------------------------------------------------------
>
>                 Key: HBASE-9217
>                 URL: https://issues.apache.org/jira/browse/HBASE-9217
>             Project: HBase
>          Issue Type: Test
>            Reporter: Ted Yu
>         Attachments: testDisableEnable.txt
>
>
> From https://builds.apache.org/job/HBase-0.95/444/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationSmallTests/testDisableEnable/ :
> {code}
> java.lang.AssertionError: Waited too much time for put replication
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.apache.hadoop.hbase.replication.TestReplicationSmallTests.testDisableEnable(TestReplicationSmallTests.java:313)
> ...
> 2013-08-14 08:06:47,228 DEBUG [RS:1;vesta:39079-EventThread.replicationSource,2] wal.ProtobufLogReader(118): After reading the trailer: walEditsStopOffset: 0, fileLength: 0, trailerPresent: false
> 2013-08-14 08:06:47,228 WARN  [RS:1;vesta:39079-EventThread.replicationSource,2] regionserver.ReplicationSource(301): 2 Got: 
> java.io.IOException: Cannot seek after EOF
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.seek(DFSClient.java:2593)
> 	at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
> 	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.seekOnFs(ProtobufLogReader.java:275)
> 	at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.seek(ReaderBase.java:115)
> 	at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.seek(ReplicationHLogReaderManager.java:108)
> 	at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:388)
> 	at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:297)
> 2013-08-14 08:06:47,228 DEBUG [RS:1;vesta:39079-EventThread.replicationSource,2] regionserver.ReplicationSource(578): Nothing to replicate, sleeping 100 times 1
> 2013-08-14 08:06:47,329 DEBUG [RS:1;vesta:39079-EventThread.replicationSource,2] fs.HFileSystem$ReorderWALBlocks(327): /user/jenkins/hbase/WALs/vesta.apache.org,39079,1376467506138/vesta.apache.org%2C39079%2C1376467506138.1376467603252 is an HLog file, so reordering blocks, last hostname will be:vesta.apache.org
> {code}
> Looking at test output from successful builds, I didn't see the above exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira