You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2007/03/11 02:50:09 UTC

[jira] Created: (HADOOP-1108) CLONE -NNBench generates millions of NotReplicatedYetException in Namenode log

CLONE -NNBench generates millions of NotReplicatedYetException in Namenode log
------------------------------------------------------------------------------

                 Key: HADOOP-1108
                 URL: https://issues.apache.org/jira/browse/HADOOP-1108
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.12.0
            Reporter: dhruba borthakur
         Assigned To: dhruba borthakur
            Priority: Blocker
             Fix For: 0.12.1


Running NNBench on latest trunk (0.12.1 candidate) on a few hundred nodes yielded 2.3 million of these exceptions in the NN log:

   2007-03-08 09:23:03,053 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error:
   org.apache.hadoop.dfs.NotReplicatedYetException: Not replicated yet
        at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:803)
        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:309)
        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)

I run NNBench to create files with block size set to 1 and replication set to 1.  NNBench then writes 1 byte to the file.  Minimum replication for the cluster is the default, ie 1.  If it encounters an exception while trying to do either the create or write operations, it loops and tries again.  Multiply this by 1000 files per node and a few hundred nodes.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1108) Checksumed file system should retry reading if a different replica is found when handle ChecksumException

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1108:
----------------------------------

       Assignee: Hairong Kuang  (was: dhruba borthakur)
    Description: Currently there is bug in the code where a checksumed file system throws an exception if a different replica is found but retry otherwise when handle ChecksumException.  (was: Running NNBench on latest trunk (0.12.1 candidate) on a few hundred nodes yielded 2.3 million of these exceptions in the NN log:

   2007-03-08 09:23:03,053 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error:
   org.apache.hadoop.dfs.NotReplicatedYetException: Not replicated yet
        at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:803)
        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:309)
        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)

I run NNBench to create files with block size set to 1 and replication set to 1.  NNBench then writes 1 byte to the file.  Minimum replication for the cluster is the default, ie 1.  If it encounters an exception while trying to do either the create or write operations, it loops and tries again.  Multiply this by 1000 files per node and a few hundred nodes.
)
        Summary: Checksumed file system should  retry reading if a different replica is found when handle ChecksumException  (was: CLONE -NNBench generates millions of NotReplicatedYetException in Namenode log)

> Checksumed file system should  retry reading if a different replica is found when handle ChecksumException
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1108
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1108
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: notyetreplciatedexception.patch
>
>
> Currently there is bug in the code where a checksumed file system throws an exception if a different replica is found but retry otherwise when handle ChecksumException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1108) Checksumed file system should retry reading if a different replica is found when handle ChecksumException

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480194 ] 

dhruba borthakur commented on HADOOP-1108:
------------------------------------------

+1. Code reviewed.

> Checksumed file system should  retry reading if a different replica is found when handle ChecksumException
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1108
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1108
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: seekNewSource.patch
>
>
> Currently there is bug in the code where a checksumed file system throws an exception if a different replica is found but retry otherwise when handle ChecksumException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1108) CLONE -NNBench generates millions of NotReplicatedYetException in Namenode log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1108:
-------------------------------------

    Attachment: notyetreplciatedexception.patch

If the checksum filesystem encounters an error while uploading a block to the first datanode, then it should try the other datanodes.

> CLONE -NNBench generates millions of NotReplicatedYetException in Namenode log
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-1108
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1108
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: notyetreplciatedexception.patch
>
>
> Running NNBench on latest trunk (0.12.1 candidate) on a few hundred nodes yielded 2.3 million of these exceptions in the NN log:
>    2007-03-08 09:23:03,053 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error:
>    org.apache.hadoop.dfs.NotReplicatedYetException: Not replicated yet
>         at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:803)
>         at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:309)
>         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
> I run NNBench to create files with block size set to 1 and replication set to 1.  NNBench then writes 1 byte to the file.  Minimum replication for the cluster is the default, ie 1.  If it encounters an exception while trying to do either the create or write operations, it loops and tries again.  Multiply this by 1000 files per node and a few hundred nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1108) Checksumed file system should retry reading if a different replica is found when handle ChecksumException

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1108:
----------------------------------

    Attachment: seekNewSource.patch

I fixed the described bug plus a couple of minor bugs in ChecksumFileSystem.FSInputChecker.

> Checksumed file system should  retry reading if a different replica is found when handle ChecksumException
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1108
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1108
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: seekNewSource.patch
>
>
> Currently there is bug in the code where a checksumed file system throws an exception if a different replica is found but retry otherwise when handle ChecksumException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1108) CLONE -NNBench generates millions of NotReplicatedYetException in Namenode log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1108:
-------------------------------------

    Status: Patch Available  (was: Open)

If an upload of a block of a checksum file fails, then try alternate datanodes. Code reviewed by Hairong.

> CLONE -NNBench generates millions of NotReplicatedYetException in Namenode log
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-1108
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1108
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: notyetreplciatedexception.patch
>
>
> Running NNBench on latest trunk (0.12.1 candidate) on a few hundred nodes yielded 2.3 million of these exceptions in the NN log:
>    2007-03-08 09:23:03,053 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error:
>    org.apache.hadoop.dfs.NotReplicatedYetException: Not replicated yet
>         at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:803)
>         at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:309)
>         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
> I run NNBench to create files with block size set to 1 and replication set to 1.  NNBench then writes 1 byte to the file.  Minimum replication for the cluster is the default, ie 1.  If it encounters an exception while trying to do either the create or write operations, it loops and tries again.  Multiply this by 1000 files per node and a few hundred nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1108) Checksumed file system should retry reading if a different replica is found when handle ChecksumException

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1108:
----------------------------------

    Attachment:     (was: notyetreplciatedexception.patch)

> Checksumed file system should  retry reading if a different replica is found when handle ChecksumException
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1108
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1108
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: seekNewSource.patch
>
>
> Currently there is bug in the code where a checksumed file system throws an exception if a different replica is found but retry otherwise when handle ChecksumException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1108) Checksumed file system should retry reading if a different replica is found when handle ChecksumException

Posted by "Tom White (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-1108:
------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Hairong!

> Checksumed file system should  retry reading if a different replica is found when handle ChecksumException
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1108
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1108
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: seekNewSource.patch
>
>
> Currently there is bug in the code where a checksumed file system throws an exception if a different replica is found but retry otherwise when handle ChecksumException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.