You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2007/03/11 02:50:09 UTC
[jira] Created: (HADOOP-1108) CLONE -NNBench generates millions of
NotReplicatedYetException in Namenode log
CLONE -NNBench generates millions of NotReplicatedYetException in Namenode log
------------------------------------------------------------------------------
Key: HADOOP-1108
URL: https://issues.apache.org/jira/browse/HADOOP-1108
Project: Hadoop
Issue Type: Bug
Components: dfs
Affects Versions: 0.12.0
Reporter: dhruba borthakur
Assigned To: dhruba borthakur
Priority: Blocker
Fix For: 0.12.1
Running NNBench on latest trunk (0.12.1 candidate) on a few hundred nodes yielded 2.3 million of these exceptions in the NN log:
2007-03-08 09:23:03,053 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error:
org.apache.hadoop.dfs.NotReplicatedYetException: Not replicated yet
at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:803)
at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:309)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
I run NNBench to create files with block size set to 1 and replication set to 1. NNBench then writes 1 byte to the file. Minimum replication for the cluster is the default, ie 1. If it encounters an exception while trying to do either the create or write operations, it loops and tries again. Multiply this by 1000 files per node and a few hundred nodes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1108) Checksumed file system should retry
reading if a different replica is found when handle ChecksumException
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-1108:
----------------------------------
Assignee: Hairong Kuang (was: dhruba borthakur)
Description: Currently there is bug in the code where a checksumed file system throws an exception if a different replica is found but retry otherwise when handle ChecksumException. (was: Running NNBench on latest trunk (0.12.1 candidate) on a few hundred nodes yielded 2.3 million of these exceptions in the NN log:
2007-03-08 09:23:03,053 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error:
org.apache.hadoop.dfs.NotReplicatedYetException: Not replicated yet
at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:803)
at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:309)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
I run NNBench to create files with block size set to 1 and replication set to 1. NNBench then writes 1 byte to the file. Minimum replication for the cluster is the default, ie 1. If it encounters an exception while trying to do either the create or write operations, it loops and tries again. Multiply this by 1000 files per node and a few hundred nodes.
)
Summary: Checksumed file system should retry reading if a different replica is found when handle ChecksumException (was: CLONE -NNBench generates millions of NotReplicatedYetException in Namenode log)
> Checksumed file system should retry reading if a different replica is found when handle ChecksumException
> ----------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1108
> URL: https://issues.apache.org/jira/browse/HADOOP-1108
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.12.0
> Reporter: dhruba borthakur
> Assigned To: Hairong Kuang
> Priority: Blocker
> Fix For: 0.12.1
>
> Attachments: notyetreplciatedexception.patch
>
>
> Currently there is bug in the code where a checksumed file system throws an exception if a different replica is found but retry otherwise when handle ChecksumException.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1108) Checksumed file system should
retry reading if a different replica is found when handle
ChecksumException
Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480194 ]
dhruba borthakur commented on HADOOP-1108:
------------------------------------------
+1. Code reviewed.
> Checksumed file system should retry reading if a different replica is found when handle ChecksumException
> ----------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1108
> URL: https://issues.apache.org/jira/browse/HADOOP-1108
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.12.0
> Reporter: dhruba borthakur
> Assigned To: Hairong Kuang
> Priority: Blocker
> Fix For: 0.12.1
>
> Attachments: seekNewSource.patch
>
>
> Currently there is bug in the code where a checksumed file system throws an exception if a different replica is found but retry otherwise when handle ChecksumException.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1108) CLONE -NNBench generates millions of
NotReplicatedYetException in Namenode log
Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur updated HADOOP-1108:
-------------------------------------
Attachment: notyetreplciatedexception.patch
If the checksum filesystem encounters an error while uploading a block to the first datanode, then it should try the other datanodes.
> CLONE -NNBench generates millions of NotReplicatedYetException in Namenode log
> ------------------------------------------------------------------------------
>
> Key: HADOOP-1108
> URL: https://issues.apache.org/jira/browse/HADOOP-1108
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.12.0
> Reporter: dhruba borthakur
> Assigned To: dhruba borthakur
> Priority: Blocker
> Fix For: 0.12.1
>
> Attachments: notyetreplciatedexception.patch
>
>
> Running NNBench on latest trunk (0.12.1 candidate) on a few hundred nodes yielded 2.3 million of these exceptions in the NN log:
> 2007-03-08 09:23:03,053 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error:
> org.apache.hadoop.dfs.NotReplicatedYetException: Not replicated yet
> at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:803)
> at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:309)
> at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
> I run NNBench to create files with block size set to 1 and replication set to 1. NNBench then writes 1 byte to the file. Minimum replication for the cluster is the default, ie 1. If it encounters an exception while trying to do either the create or write operations, it loops and tries again. Multiply this by 1000 files per node and a few hundred nodes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1108) Checksumed file system should retry
reading if a different replica is found when handle ChecksumException
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-1108:
----------------------------------
Attachment: seekNewSource.patch
I fixed the described bug plus a couple of minor bugs in ChecksumFileSystem.FSInputChecker.
> Checksumed file system should retry reading if a different replica is found when handle ChecksumException
> ----------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1108
> URL: https://issues.apache.org/jira/browse/HADOOP-1108
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.12.0
> Reporter: dhruba borthakur
> Assigned To: Hairong Kuang
> Priority: Blocker
> Fix For: 0.12.1
>
> Attachments: seekNewSource.patch
>
>
> Currently there is bug in the code where a checksumed file system throws an exception if a different replica is found but retry otherwise when handle ChecksumException.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1108) CLONE -NNBench generates millions of
NotReplicatedYetException in Namenode log
Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur updated HADOOP-1108:
-------------------------------------
Status: Patch Available (was: Open)
If an upload of a block of a checksum file fails, then try alternate datanodes. Code reviewed by Hairong.
> CLONE -NNBench generates millions of NotReplicatedYetException in Namenode log
> ------------------------------------------------------------------------------
>
> Key: HADOOP-1108
> URL: https://issues.apache.org/jira/browse/HADOOP-1108
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.12.0
> Reporter: dhruba borthakur
> Assigned To: dhruba borthakur
> Priority: Blocker
> Fix For: 0.12.1
>
> Attachments: notyetreplciatedexception.patch
>
>
> Running NNBench on latest trunk (0.12.1 candidate) on a few hundred nodes yielded 2.3 million of these exceptions in the NN log:
> 2007-03-08 09:23:03,053 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error:
> org.apache.hadoop.dfs.NotReplicatedYetException: Not replicated yet
> at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:803)
> at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:309)
> at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
> I run NNBench to create files with block size set to 1 and replication set to 1. NNBench then writes 1 byte to the file. Minimum replication for the cluster is the default, ie 1. If it encounters an exception while trying to do either the create or write operations, it loops and tries again. Multiply this by 1000 files per node and a few hundred nodes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1108) Checksumed file system should retry
reading if a different replica is found when handle ChecksumException
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-1108:
----------------------------------
Attachment: (was: notyetreplciatedexception.patch)
> Checksumed file system should retry reading if a different replica is found when handle ChecksumException
> ----------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1108
> URL: https://issues.apache.org/jira/browse/HADOOP-1108
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.12.0
> Reporter: dhruba borthakur
> Assigned To: Hairong Kuang
> Priority: Blocker
> Fix For: 0.12.1
>
> Attachments: seekNewSource.patch
>
>
> Currently there is bug in the code where a checksumed file system throws an exception if a different replica is found but retry otherwise when handle ChecksumException.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1108) Checksumed file system should retry
reading if a different replica is found when handle ChecksumException
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White updated HADOOP-1108:
------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
I've just committed this. Thanks Hairong!
> Checksumed file system should retry reading if a different replica is found when handle ChecksumException
> ----------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1108
> URL: https://issues.apache.org/jira/browse/HADOOP-1108
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.12.0
> Reporter: dhruba borthakur
> Assigned To: Hairong Kuang
> Priority: Blocker
> Fix For: 0.12.1
>
> Attachments: seekNewSource.patch
>
>
> Currently there is bug in the code where a checksumed file system throws an exception if a different replica is found but retry otherwise when handle ChecksumException.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.