You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "stack (JIRA)" <ji...@apache.org> on 2007/10/12 07:46:50 UTC

[jira] Commented: (HADOOP-2040) [hbase] TestHStoreFile/TestBloomFilter hang occasionally on hudson AFTER test has finished

    [ https://issues.apache.org/jira/browse/HADOOP-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534225 ] 

stack commented on HADOOP-2040:
-------------------------------

Looks like it hung again in same build -- #931 -- but this time in a test that hasn't been prone to hanging, TestListTables.  Again I can't get a thread dump but log is interesting on the way out:

{code}
    [junit] Shutting down the Mini HDFS Cluster
    [junit] Shutting down DataNode 1
    [junit] 2007-10-12 05:23:16,082 WARN  [DataNode: [/export/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data3,/export/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data4]] org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:596): java.io.IOException: java.lang.InterruptedException
    [junit] Shutting down DataNode 0
    [junit] 	at org.apache.hadoop.fs.ShellCommand.runCommand(ShellCommand.java:59)
    [junit] 	at org.apache.hadoop.fs.ShellCommand.run(ShellCommand.java:42)
    [junit] 	at org.apache.hadoop.fs.DU.getUsed(DU.java:52)
    [junit] 	at org.apache.hadoop.dfs.FSDataset$FSVolume.getDfsUsed(FSDataset.java:299)
    [junit] 	at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getDfsUsed(FSDataset.java:396)
    [junit] 	at org.apache.hadoop.dfs.FSDataset.getDfsUsed(FSDataset.java:495)
    [junit] 	at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:520)
    [junit] 	at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1494)
    [junit] 	at java.lang.Thread.run(Thread.java:595)

    [junit] 2007-10-12 05:23:16,349 WARN  [DataNode: [/export/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data1,/export/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data2]] org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:596): java.io.InterruptedIOException
    [junit] 	at java.net.SocketOutputStream.socketWrite0(Native Method)
    [junit] 	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
    [junit] 	at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
    [junit] 	at org.apache.hadoop.ipc.Client$Connection$2.write(Client.java:192)
    [junit] 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    [junit] 	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    [junit] 	at java.io.DataOutputStream.flush(DataOutputStream.java:106)
    [junit] 	at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:327)
    [junit] 	at org.apache.hadoop.ipc.Client.call(Client.java:474)
    [junit] 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
    [junit] 	at org.apache.hadoop.dfs.$Proxy1.sendHeartbeat(Unknown Source)
    [junit] 	at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:520)
    [junit] 	at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1494)
    [junit] 	at java.lang.Thread.run(Thread.java:595)

    [junit] 2007-10-12 05:23:16,351 WARN  [org.apache.hadoop.dfs.PendingReplicationBlocks$PendingReplicationMonitor@157c2bd] org.apache.hadoop.dfs.PendingReplicationBlocks$PendingReplicationMonitor.run(PendingReplicationBlocks.java:186): PendingReplicationMonitor thread received exception. java.lang.InterruptedException: sleep interrupted
    [junit] 2007-10-12 05:23:16,610 INFO  [main] org.apache.hadoop.hbase.MiniHBaseCluster.shutdown(MiniHBaseCluster.java:424): Shutting down FileSystem
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 36.108 sec
{code}

It reports tests succeeded but just before hand its reporting and interrupted flush.  I wonder if interrupt broke the flush.  It would be interesting to know (for HADOOP-1924).

> [hbase] TestHStoreFile/TestBloomFilter hang occasionally on hudson AFTER test has finished
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2040
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2040
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Priority: Minor
>
> Weird.  Last night TestBloomFilter was hung after junit had printed test had completed without error.  Just now, I noticed a hung TestHStore -- again after junit had printed out test had succeeded (Nigel Daley has reported he's seen at least two hangs in TestHStoreFile, perhaps in same location).
> Last night and just now I was unable to get a thread dump.
> Here is log from around this evenings hang:
> {code}
> ...
>     [junit] 2007-10-12 04:19:28,477 INFO  [main] org.apache.hadoop.hbase.TestHStoreFile.testOutOfRangeMidkeyHalfMapFile(TestHStoreFile.java:366): Last bottom when key > top: zz/zz/1192162768317
>     [junit] 2007-10-12 04:19:28,493 WARN  [IPC Server handler 0 on 36620] org.apache.hadoop.dfs.FSDirectory.unprotectedDelete(FSDirectory.java:400): DIR* FSDirectory.unprotectedDelete: failed to remove /testOutOfRangeMidkeyHalfMapFile because it does not exist
>     [junit] Shutting down the Mini HDFS Cluster
>     [junit] Shutting down DataNode 1
>     [junit] Shutting down DataNode 0
>     [junit] 2007-10-12 04:19:29,316 WARN  [org.apache.hadoop.dfs.PendingReplicationBlocks$PendingReplicationMonitor@ed9f47] org.apache.hadoop.dfs.PendingReplicationBlocks$PendingReplicationMonitor.run(PendingReplicationBlocks.java:186): PendingReplicationMonitor thread received exception. java.lang.InterruptedException: sleep interrupted
>     [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 16.274 sec
>     [junit] Running org.apache.hadoop.hbase.TestHTable
>     [junit] Starting DataNode 0 with dfs.data.dir: /export/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data1,/export/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data2
>     [junit] Starting DataNode 1 with dfs.data.dir: /export/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data3,/export/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build/contrib/hbase/test/data/dfs/data/data4
>     [junit] 2007-10-12 05:21:48,332 INFO  [main] org.apache.hadoop.hbase.HMaster.<init>(HMaster.java:862): Root region dir: /hbase/hregion_-ROOT-,,0
> ...
> {code}
> Notice the hour of elapsed (hung) time in above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.