You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2009/03/27 01:19:50 UTC
[jira] Created: (HADOOP-5588) hadoop commands seem extremely slow
in 0.20 branch
hadoop commands seem extremely slow in 0.20 branch
--------------------------------------------------
Key: HADOOP-5588
URL: https://issues.apache.org/jira/browse/HADOOP-5588
Project: Hadoop Core
Issue Type: Bug
Components: dfs, fs
Environment: 0.20-branch and trunk
Reporter: Koji Noguchi
Priority: Blocker
hadoop dfs -get/rm/mkdir/etc mydir/fileA mydir/fileB mydir/fileC ...
seem to be very slow in 0.20 branch.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5588) hadoop commands seem extremely slow
in 0.20 branch
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693864#action_12693864 ]
Hairong Kuang commented on HADOOP-5588:
---------------------------------------
Koji did some experiments with the patch. He is too busy to post the results. I am doing this for him.
Directory size with 10,000 files.
About 450 mappers. Each mapper calling dfs -get 10000 times.
Without the fix, namenode was showing 20-30 getblocklocations per sec and 30-40 threads blocked.
With the fix, 600 getblocklocations per sec and almost no blocked threads.
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
> Key: HADOOP-5588
> URL: https://issues.apache.org/jira/browse/HADOOP-5588
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.20.0
> Environment: 0.20-branch and trunk
> Reporter: Koji Noguchi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.20.0, 0.21.0
>
> Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5588) hadoop commands seem extremely slow
in 0.20 branch
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690016#action_12690016 ]
Hairong Kuang commented on HADOOP-5588:
---------------------------------------
I am not able to get to run all unit tests, but all fs/dfs related unit tests were passed.
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
> Key: HADOOP-5588
> URL: https://issues.apache.org/jira/browse/HADOOP-5588
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.20.0
> Environment: 0.20-branch and trunk
> Reporter: Koji Noguchi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-5588) hadoop commands seem extremely slow
in 0.20 branch
Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz Wo (Nicholas), SZE resolved HADOOP-5588.
--------------------------------------------
Resolution: Fixed
Fix Version/s: 0.21.0
I have committed this to 0.20 and above. Thanks, Hairong!
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
> Key: HADOOP-5588
> URL: https://issues.apache.org/jira/browse/HADOOP-5588
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.20.0
> Environment: 0.20-branch and trunk
> Reporter: Koji Noguchi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.20.0, 0.21.0
>
> Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5588) hadoop commands seem extremely slow
in 0.20 branch
Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz Wo (Nicholas), SZE updated HADOOP-5588:
-------------------------------------------
Hadoop Flags: [Reviewed]
{noformat}
[exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no tests are needed for this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
{noformat}
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
> Key: HADOOP-5588
> URL: https://issues.apache.org/jira/browse/HADOOP-5588
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.20.0
> Environment: 0.20-branch and trunk
> Reporter: Koji Noguchi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5588) hadoop commands seem extremely slow
in 0.20 branch
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695422#action_12695422 ]
Hudson commented on HADOOP-5588:
--------------------------------
Integrated in Hadoop-trunk #796 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/796/])
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
> Key: HADOOP-5588
> URL: https://issues.apache.org/jira/browse/HADOOP-5588
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.20.0
> Environment: 0.20-branch and trunk
> Reporter: Koji Noguchi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.20.0, 0.21.0
>
> Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5588) hadoop commands seem extremely slow
in 0.20 branch
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-5588:
----------------------------------
Attachment: globStatus1.patch
This patch fixed a bug in the previous patch.
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
> Key: HADOOP-5588
> URL: https://issues.apache.org/jira/browse/HADOOP-5588
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.20.0
> Environment: 0.20-branch and trunk
> Reporter: Koji Noguchi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5588) hadoop commands seem extremely slow
in 0.20 branch
Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690018#action_12690018 ]
Tsz Wo (Nicholas), SZE commented on HADOOP-5588:
------------------------------------------------
+1 patch looks good.
I tested manually and ran some related tests. Everything has worked fine.
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
> Key: HADOOP-5588
> URL: https://issues.apache.org/jira/browse/HADOOP-5588
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.20.0
> Environment: 0.20-branch and trunk
> Reporter: Koji Noguchi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5588) hadoop commands seem extremely slow
in 0.20 branch
Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi updated HADOOP-5588:
---------------------------------
Description:
hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
seem to be very slow in 0.20 branch.
was:
hadoop dfs -get/rm/mkdir/etc mydir/fileA mydir/fileB mydir/fileC ...
seem to be very slow in 0.20 branch.
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
> Key: HADOOP-5588
> URL: https://issues.apache.org/jira/browse/HADOOP-5588
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.20.0
> Environment: 0.20-branch and trunk
> Reporter: Koji Noguchi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.20.0
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5588) hadoop commands seem extremely slow
in 0.20 branch
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-5588:
----------------------------------
Affects Version/s: 0.20.0
Fix Version/s: 0.20.0
Assignee: Hairong Kuang
A suspect is HADOOP-3497 which introduced a listing call on the parent directory no matter the path contains globs or not in globStatus. One of our users calls "dfs -get" on many small files under one dir. It has the same effect of calling dfs -ls many times on a large directory, thus causing NN to do lots of gc and making it less responsive.
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
> Key: HADOOP-5588
> URL: https://issues.apache.org/jira/browse/HADOOP-5588
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.20.0
> Environment: 0.20-branch and trunk
> Reporter: Koji Noguchi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.20.0
>
>
> hadoop dfs -get/rm/mkdir/etc mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5588) hadoop commands seem extremely slow
in 0.20 branch
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-5588:
----------------------------------
Attachment: globStatus.patch
This patch restores pre-0.20.0 behavior.
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
> Key: HADOOP-5588
> URL: https://issues.apache.org/jira/browse/HADOOP-5588
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.20.0
> Environment: 0.20-branch and trunk
> Reporter: Koji Noguchi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: globStatus.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5588) hadoop commands seem extremely slow
in 0.20 branch
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690014#action_12690014 ]
Hairong Kuang commented on HADOOP-5588:
---------------------------------------
Manual tests on dfs -ls/get etc. showed that the patch removed the additional listing call to the parent directory if the input path did not contain a glob.
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
> Key: HADOOP-5588
> URL: https://issues.apache.org/jira/browse/HADOOP-5588
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.20.0
> Environment: 0.20-branch and trunk
> Reporter: Koji Noguchi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-5588) hadoop commands seem
extremely slow in 0.20 branch
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689759#action_12689759 ]
Hairong Kuang edited comment on HADOOP-5588 at 3/27/09 10:02 AM:
-----------------------------------------------------------------
A suspect is HADOOP-3497 which introduced a listing call on the parent directory no matter the path contains globs or not in globStatus. One of our users calls "dfs -get" on many small files under one large directory. It has the same effect of calling dfs -ls many times on the large directory, thus causing NN to do lots of gc and making it less responsive.
was (Author: hairong):
A suspect is HADOOP-3497 which introduced a listing call on the parent directory no matter the path contains globs or not in globStatus. One of our users calls "dfs -get" on many small files under one dir. It has the same effect of calling dfs -ls many times on a large directory, thus causing NN to do lots of gc and making it less responsive.
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
> Key: HADOOP-5588
> URL: https://issues.apache.org/jira/browse/HADOOP-5588
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.20.0
> Environment: 0.20-branch and trunk
> Reporter: Koji Noguchi
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.20.0
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.