You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2009/03/05 22:35:57 UTC
[jira] Created: (HADOOP-5412) TestInjectionForSimulatedStorage
oaccasionally fails on timeout
TestInjectionForSimulatedStorage oaccasionally fails on timeout
---------------------------------------------------------------
Key: HADOOP-5412
URL: https://issues.apache.org/jira/browse/HADOOP-5412
Project: Hadoop Core
Issue Type: Bug
Affects Versions: 0.18.3
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Fix For: 0.18.4
Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5412) TestInjectionForSimulatedStorage
oaccasionally fails on timeout
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-5412:
----------------------------------
Attachment: simulatedLoop.patch
This patch is for the trunk. The previous is for 0.18.
> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
> Key: HADOOP-5412
> URL: https://issues.apache.org/jira/browse/HADOOP-5412
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.18.3
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.18.4
>
> Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5412) TestInjectionForSimulatedStorage
oaccasionally fails on timeout
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghu Angadi updated HADOOP-5412:
---------------------------------
Hadoop Flags: [Reviewed]
> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
> Key: HADOOP-5412
> URL: https://issues.apache.org/jira/browse/HADOOP-5412
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.18.3
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.18.4
>
> Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5412) TestInjectionForSimulatedStorage
oaccasionally fails on timeout
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681019#action_12681019 ]
Hairong Kuang commented on HADOOP-5412:
---------------------------------------
ant test-patch succeeded:
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 6 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
[exec]
ant test-core passed too.
> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
> Key: HADOOP-5412
> URL: https://issues.apache.org/jira/browse/HADOOP-5412
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.18.3
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.18.4
>
> Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-5412) TestInjectionForSimulatedStorage
oaccasionally fails on timeout
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang resolved HADOOP-5412.
-----------------------------------
Resolution: Fixed
I've committed this.
> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
> Key: HADOOP-5412
> URL: https://issues.apache.org/jira/browse/HADOOP-5412
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.18.3
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.18.4
>
> Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5412) TestInjectionForSimulatedStorage
oaccasionally fails on timeout
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-5412:
----------------------------------
Attachment: simulatedLoop.patch
The patch does the following:
1. Disallow a simulated datanode to write to a block if the block is being written;
2. In the test, make sure that all datanodes are up before injecting blocks. This could greatly reduce the chance that NN schedules two concurrent replications to the same datanode.
> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
> Key: HADOOP-5412
> URL: https://issues.apache.org/jira/browse/HADOOP-5412
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.18.3
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.18.4
>
> Attachments: simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5412) TestInjectionForSimulatedStorage
oaccasionally fails on timeout
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679395#action_12679395 ]
Hairong Kuang commented on HADOOP-5412:
---------------------------------------
The problem was that a simulated DataNode allows two concurrent writes to the same block. In the above case, NN happened to schedule two concurrent replications of a block to the same datanode (127.0.0.1:48095). The first replication succeeded but the second one failed on the error: Finalizing a block that has already been finalized, which caused the block to be deleted on the datanode. Subsequent request to replicate the block on this datanode failed because the block was deleted. As a result, the block remained to stay under-replicated before the test timed out.
> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
> Key: HADOOP-5412
> URL: https://issues.apache.org/jira/browse/HADOOP-5412
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.18.3
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.18.4
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5412) TestInjectionForSimulatedStorage
oaccasionally fails on timeout
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679389#action_12679389 ]
Hairong Kuang commented on HADOOP-5412:
---------------------------------------
Below are the related logs that explain what happened to this block:
# INFO dfs.StateChange (FSNamesystem.java:addStoredBlock(2839)) - BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48092 is added to blk_6302924909504458109_1001 size 32
# INFO dfs.StateChange (FSNamesystem.java:computeReplicationWork(2403)) - BLOCK* ask 127.0.0.1:48092 to replicate blk_6302924909504458109_1001 to datanode(s) 127.0.0.1:48095
# INFO dfs.StateChange (FSNamesystem.java:computeReplicationWork(2403))- BLOCK* ask 127.0.0.1:48092 to replicate blk_6302924909504458109_1001 to datanode(s) 127.0.0.1:48095
# INFO dfs.DataNode (DataNode.java:writeBlock(1205)) - Receiving block blk_6302924909504458109_1001 src: /127.0.0.1:48124 dest: /127.0.0.1:48095
# INFO dfs.DataNode (DataNode.java:writeBlock(1205)) - Receiving block blk_6302924909504458109_1001 src: /127.0.0.1:48125 dest: /127.0.0.1:48095
# INFO dfs.DataNode (DataNode.java:writeBlock(1340)) - Received block blk_6302924909504458109_1001 src: /127.0.0.1:48125 dest: /127.0.0.1:48095 of size 32
# WARN dfs.DataNode (DataNode.java:receiveBlock(2804)) - Exception in receiveBlock for block blk_6302924909504458109_1001 java.io.IOException: Finalizing a block that has already been finalized6302924909504458109
# INFO dfs.StateChange (FSNamesystem.java:addStoredBlock(2839)) - BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48095 is added to blk_6302924909504458109_1001 size 32
# INFO dfs.StateChange (FSNamesystem.java:computeReplicationWork(2403)) - BLOCK* ask 127.0.0.1:48095 to replicate blk_6302924909504458109_1001 to datanode(s) 127.0.0.1:48101 127.0.0.1:48113
# INFO dfs.DataNode (DataNode.java:transferBlocks(908)) - Can't send invalid block blk_6302924909504458109_1001
> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
> Key: HADOOP-5412
> URL: https://issues.apache.org/jira/browse/HADOOP-5412
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.18.3
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.18.4
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5412) TestInjectionForSimulatedStorage
occasionally fails on timeout
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681778#action_12681778 ]
Hudson commented on HADOOP-5412:
--------------------------------
Integrated in Hadoop-trunk #778 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/778/])
> TestInjectionForSimulatedStorage occasionally fails on timeout
> --------------------------------------------------------------
>
> Key: HADOOP-5412
> URL: https://issues.apache.org/jira/browse/HADOOP-5412
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.18.3
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.18.4
>
> Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5412) TestInjectionForSimulatedStorage
occasionally fails on timeout
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-5412:
----------------------------------
Summary: TestInjectionForSimulatedStorage occasionally fails on timeout (was: TestInjectionForSimulatedStorage oaccasionally fails on timeout)
> TestInjectionForSimulatedStorage occasionally fails on timeout
> --------------------------------------------------------------
>
> Key: HADOOP-5412
> URL: https://issues.apache.org/jira/browse/HADOOP-5412
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.18.3
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.18.4
>
> Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5412) TestInjectionForSimulatedStorage
oaccasionally fails on timeout
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679754#action_12679754 ]
Raghu Angadi commented on HADOOP-5412:
--------------------------------------
+1.
> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
> Key: HADOOP-5412
> URL: https://issues.apache.org/jira/browse/HADOOP-5412
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.18.3
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.18.4
>
> Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5412) TestInjectionForSimulatedStorage
oaccasionally fails on timeout
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679416#action_12679416 ]
Hairong Kuang commented on HADOOP-5412:
---------------------------------------
I've run the test for 50 times back to back without hitting the infinite loop. Previously I saw the loop before the 20th run.
> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
> Key: HADOOP-5412
> URL: https://issues.apache.org/jira/browse/HADOOP-5412
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.18.3
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.18.4
>
> Attachments: simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.