You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2009/03/05 22:35:57 UTC

[jira] Created: (HADOOP-5412) TestInjectionForSimulatedStorage oaccasionally fails on timeout

TestInjectionForSimulatedStorage oaccasionally fails on timeout
---------------------------------------------------------------

                 Key: HADOOP-5412
                 URL: https://issues.apache.org/jira/browse/HADOOP-5412
             Project: Hadoop Core
          Issue Type: Bug
    Affects Versions: 0.18.3
            Reporter: Hairong Kuang
            Assignee: Hairong Kuang
             Fix For: 0.18.4


Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5412) TestInjectionForSimulatedStorage oaccasionally fails on timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5412:
----------------------------------

    Attachment: simulatedLoop.patch

This patch is for the trunk. The previous is for 0.18.

> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5412
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.18.4
>
>         Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5412) TestInjectionForSimulatedStorage oaccasionally fails on timeout

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-5412:
---------------------------------

    Hadoop Flags: [Reviewed]

> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5412
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.18.4
>
>         Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5412) TestInjectionForSimulatedStorage oaccasionally fails on timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681019#action_12681019 ] 

Hairong Kuang commented on HADOOP-5412:
---------------------------------------

ant test-patch succeeded:

     [exec] +1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec]

ant test-core passed too.

> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5412
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.18.4
>
>         Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-5412) TestInjectionForSimulatedStorage oaccasionally fails on timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang resolved HADOOP-5412.
-----------------------------------

    Resolution: Fixed

I've committed this.

> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5412
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.18.4
>
>         Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5412) TestInjectionForSimulatedStorage oaccasionally fails on timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5412:
----------------------------------

    Attachment: simulatedLoop.patch

The patch does the following:
1. Disallow a simulated datanode to write to a block if the block is being written;
2. In the test, make sure that all datanodes are up before injecting blocks. This could greatly reduce the chance that NN schedules two concurrent replications to the same datanode.

> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5412
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.18.4
>
>         Attachments: simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5412) TestInjectionForSimulatedStorage oaccasionally fails on timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679395#action_12679395 ] 

Hairong Kuang commented on HADOOP-5412:
---------------------------------------

The problem was that a simulated DataNode allows two concurrent writes to the same block. In the above case, NN happened to schedule two concurrent replications of a block to the same datanode (127.0.0.1:48095). The first replication succeeded but the second one failed on the error: Finalizing a block that has already been finalized, which caused the block to be deleted on the datanode. Subsequent request to replicate the block on this datanode failed because the block was deleted. As a result, the block remained to stay under-replicated before the test timed out.

> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5412
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.18.4
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5412) TestInjectionForSimulatedStorage oaccasionally fails on timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679389#action_12679389 ] 

Hairong Kuang commented on HADOOP-5412:
---------------------------------------

Below are the related logs that explain what happened to this block:
# INFO  dfs.StateChange (FSNamesystem.java:addStoredBlock(2839)) - BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48092 is added to blk_6302924909504458109_1001 size 32
# INFO  dfs.StateChange (FSNamesystem.java:computeReplicationWork(2403)) - BLOCK* ask 127.0.0.1:48092 to replicate blk_6302924909504458109_1001 to datanode(s) 127.0.0.1:48095
# INFO  dfs.StateChange (FSNamesystem.java:computeReplicationWork(2403))- BLOCK* ask 127.0.0.1:48092 to replicate blk_6302924909504458109_1001 to datanode(s) 127.0.0.1:48095
# INFO  dfs.DataNode (DataNode.java:writeBlock(1205)) - Receiving block blk_6302924909504458109_1001 src: /127.0.0.1:48124 dest: /127.0.0.1:48095 
# INFO dfs.DataNode (DataNode.java:writeBlock(1205)) - Receiving block blk_6302924909504458109_1001 src: /127.0.0.1:48125 dest: /127.0.0.1:48095 
# INFO dfs.DataNode (DataNode.java:writeBlock(1340)) - Received block blk_6302924909504458109_1001 src: /127.0.0.1:48125 dest: /127.0.0.1:48095 of size 32  
# WARN dfs.DataNode (DataNode.java:receiveBlock(2804)) - Exception in receiveBlock for block blk_6302924909504458109_1001 java.io.IOException: Finalizing a block that has already been finalized6302924909504458109
# INFO  dfs.StateChange (FSNamesystem.java:addStoredBlock(2839)) - BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48095 is added to blk_6302924909504458109_1001 size 32 
# INFO  dfs.StateChange (FSNamesystem.java:computeReplicationWork(2403)) - BLOCK* ask 127.0.0.1:48095 to replicate blk_6302924909504458109_1001 to datanode(s) 127.0.0.1:48101 127.0.0.1:48113
# INFO  dfs.DataNode (DataNode.java:transferBlocks(908)) - Can't send invalid block blk_6302924909504458109_1001

> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5412
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.18.4
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5412) TestInjectionForSimulatedStorage occasionally fails on timeout

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681778#action_12681778 ] 

Hudson commented on HADOOP-5412:
--------------------------------

Integrated in Hadoop-trunk #778 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/778/])
    

> TestInjectionForSimulatedStorage occasionally fails on timeout
> --------------------------------------------------------------
>
>                 Key: HADOOP-5412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5412
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.18.4
>
>         Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5412) TestInjectionForSimulatedStorage occasionally fails on timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5412:
----------------------------------

    Summary: TestInjectionForSimulatedStorage occasionally fails on timeout  (was: TestInjectionForSimulatedStorage oaccasionally fails on timeout)

> TestInjectionForSimulatedStorage occasionally fails on timeout
> --------------------------------------------------------------
>
>                 Key: HADOOP-5412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5412
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.18.4
>
>         Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5412) TestInjectionForSimulatedStorage oaccasionally fails on timeout

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679754#action_12679754 ] 

Raghu Angadi commented on HADOOP-5412:
--------------------------------------

+1. 

> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5412
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.18.4
>
>         Attachments: simulatedLoop.patch, simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5412) TestInjectionForSimulatedStorage oaccasionally fails on timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679416#action_12679416 ] 

Hairong Kuang commented on HADOOP-5412:
---------------------------------------

I've run the test for 50 times back to back without hitting the infinite loop. Previously I saw the loop before the 20th run.

> TestInjectionForSimulatedStorage oaccasionally fails on timeout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5412
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.18.4
>
>         Attachments: simulatedLoop.patch
>
>
> Occasionally TestInjectionForSimulatedStorage falls into an infinite loop, waiting for a block to reach its replication factor. The log repeatedly prints the following message:
> dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 2th block blk_6302924909504458109_1001 yet. Expecting 4, got 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.